Chapter 4. Infrastructure as Code Tools and Languages

Systems administrators have been using scripts, configuration, and tools to automatically provision and configure infrastructure for decades. Infrastructure as Code emerged into the mainstream with the adoption of virtualization and the cloud, and has evolved considerably along with those technologies.

This chapter discusses infrastructure code techniques, languages, and tools. Whichever automation tool you currently use, understanding the landscape and evolution of tools will help you make informed decisions about how to design, implement, and deliver infrastructure. And the evolution of infrastructure tools and techniques is far from over, so every team will need to continually evaluate new tools and adopt new techniques as options emerge.

The first topic is the essentials of Infrastructure as Code, including comparison with alternatives like ClickOps and task-focused scripting. We’ll discuss the types of things that can be configured as code and the consequences of a code-based approach to infrastructure configuration.

The next topic is infrastructure code processing models. We need to think about the differences between editing infrastructure code, assembling it, processing it, applying it, and the resulting deployed infrastructure resources. This is important context for understanding how infrastructure code tools work, as well as for making decisions about how to design and deliver infrastructure.

The context of these processing models informs the following discussion of infrastructure tools and their approaches to languages, imperative versus declarative languages and tools, and “real” programming languages versus domain-specific languages (DSLs). These models also help to understand why some tools use external state files.

The final part of this chapter looks at some emerging models for infrastructure automation, potentially the next stage in the evolution of Infrastructure as Code.

Coding Infrastructure

There are simpler ways to provision infrastructure than writing code and then using a tool to apply it to an IaaS API. You could follow the ClickOps approach by opening the platform’s web-based user interface in a browser and poking and clicking an application server cluster into being. Or you could embrace CLIOps by opening a prompt and using your command-line prowess to wield the IaaS platform’s command-line interface (CLI) tool to forge an unbreakable network boundary. However, building infrastructure by hand, whether with a GUI or on the command line, creates systems that are difficult to maintain and keep consistent.

Moving Beyond Task-Based Scripting

Systems administration has always involved automating tasks by writing scripts, and often developing more complex in-house tools using languages like Perl, Python, and Ruby. Task-focused scripting focuses on activities within an overall process, such as “create a server” or “add nodes to a container cluster.” A symptom of task-focused scripting is wiki pages and other documentation that list steps to carry out the overall process, including guidance on what to do before running the script, arguments to use when running it, and then steps to carry out after it finishes.

Task-focused scripting usually needs someone to actively make decisions to carry out a complete process. Infrastructure as Code, on the other hand, pulls decision-making knowledge into the act of writing definitions, specifications, code, and tests. Provisioning, configuring, and deploying systems then become hands-off processes that can be left to the machines. Getting this right creates capabilities like automated scaling, recovery, and self-service platform services.

Capturing decision making in code and specifications also creates the opportunity for more-effective knowledge sharing and governance. Anyone can understand how the system is implemented by reading the code. Compliance can be assured and proven by reviewing the history of code changes. People can use the code and its history to troubleshoot and fix issues.

Understanding What You Can Define as Code

The “as code” paradigm works for many parts of infrastructure as well as elements that are infrastructure-adjacent. Here’s a partial list of what you might consider defining as code:

IaaS resources: Collections of infrastructure resources provisioned on an IaaS platform are defined and managed as deployable infrastructure stacks.¹
Servers: Managing the configuration of OSs and other elements of servers was the focus of the first generation of Infrastructure as Code tools; Chapter 11 covers this topic further.
Hardware provisioning: Even most physical devices can be provisioned and configured using code. “Creating a Server by Using Network Provisioning” describes automating bare-metal server provisioning. SDN can be used to automate the configuration of networking devices like routers and firewalls.
Application deployments: Application deployment has largely moved away from procedural deployment scripts in favor of immutable containers and declarative descriptors.
Delivery pipelines: CD pipeline stages for building, deploying, and testing software can and should be defined as code (see Chapter 16).
Platform service configuration: Other services such as monitoring, log aggregation, and identity management should ideally be configured as code. When using a platform service provided by your cloud vendor, you can provision and configure it by using the same infrastructure-coding tools you use for lower-level infrastructure. But even when using a platform service from a separate SaaS provider, or through a packaged application deployed on your IaaS platform, you should configure it by using APIs and code as well. Many infrastructure deployment tools like Terraform and Pulumi support plug-ins and extensions to configure third-party software as well as IaaS and SaaS resources. Configuring platform services as code has the added benefit of making it easier to integrate infrastructure with other resources such as monitoring and DNS.
Tests: Tests, monitoring checks, and other validations should all be defined as code. See Chapter 17 for more.

Code or Configuration?

What is the difference between “as code” and configuration? A working definition for this book is that code specifies those aspects of a component that are consistent across multiple deployed instances of the component. Configuration specifies those aspects that are specific to a particular deployed instance of a component. For example, the FoodSpin team uses infrastructure code to define an application server as a VM provisioned with an Ubuntu, Java, JBoss, and various other packages and settings. The team uses configuration to specify a different amount of system memory to allocate to the VM in each environment.

These definitions are imperfect and, honestly, not used consistently. I might mention a JBoss “configuration file” that is copied to every instance of the application server, even if it’s the same for every server. But I would talk about Ansible “code” used to dynamically generate that file on each server.

Some people describe all infrastructure code as configuration, especially code not written in a “real” programming language like Python. However, complex declarative infrastructure codebases are fundamentally different from simple property files. Even if it’s written in a “not-real” language like YAML, a complex codebase needs to be managed with the same discipline and governance as “actual” code written in Java. Later, this chapter offers more useful terminology for describing types of Infrastructure as Code languages.

Treat Infrastructure Code Like Real Code

Many infrastructure codebases evolve from configuration files and utility scripts into unmanageable messes. Too often, people don’t consider infrastructure code to be “real” code. They don’t give it the same level of engineering discipline as application code. To keep an infrastructure codebase maintainable, you need to treat it as a first-class concern.

Design and manage your infrastructure code so that it is easy to understand and maintain. Follow code-quality practices, such as code reviews, pair programming, and automated testing. Your team should be aware of technical debt and strive to minimize it.

Chapter 5 describes how to apply various software design principles to infrastructure, such as improving cohesion and reducing coupling. “Organizing Projects in a Codebase” explains ways to organize and manage infrastructure codebases to make them easier to work with.

Defining Configuration as Code

Many tools, including some infrastructure-automation tools, manage configuration with a closed-box model: users edit and manage configuration via a UI or perhaps an API, but the tool manages all access to the data files where the configuration is stored. This closed-box approach offers benefits, like checking that configuration follows the correct syntax. However, this approach sacrifices many benefits of externalizing configuration.

For example, some packaged platform services like CI servers and monitoring servers were originally released as closed-box tools but later added the capability to define aspects of their configuration as code.²

Externalized configuration can be stored in a standard version-control system. The configuration code can be organized in useful ways, such as putting an application’s pipeline configuration with the application code and the code for the tests the pipeline runs. The following are other examples of things you can do with externalized configuration that you can’t do with closed-box configuration:

Use a software development IDE or text editor with advanced syntax highlighting, refactoring, and AI coding assistance.
Edit the configuration together with associated code and configuration, potentially with automated support for ensuring that dependencies across them are correct.
Automatically trigger CI or pipeline stages to run in order to test when a configuration change is made and to give fast feedback on errors.
Track and record configuration and code changes in versions across different but related tools and systems. For example, understand the configuration settings used when a particular version of an application’s code was deployed.

For infrastructure in particular, specifying infrastructure components as code enables teams to use Agile engineering practices like TDD, CI, and CD pipelines.

Managing Your Code in a Source Code Repository

Code and configuration for infrastructure and other system elements should be stored in a source code repository—also called a version control system (VCS) or source code management (SCM) system. These systems provide loads of useful features, including tracking changes, comparing versions, and recovering and using old revisions when new ones have issues. They can also trigger actions automatically when changes are committed, which is the enabler for CI jobs and CD pipelines.

Secrets in Source Control

One thing that you should not put into source control is unencrypted secrets, such as passwords and keys. Even if your source code repository is private, its history and revisions of code are too easily leaked. Secrets leaked from source code are one of the most common causes of security breaches. See “Handling Secrets” for better ways to manage secrets.

Infrastructure Code Processing

The ability to treat infrastructure like software opens many possibilities, such as applying well-proven software design principles and patterns to our infrastructure architecture. But differences between how infrastructure code and application code work can be confusing, leading us to force-fit concepts and techniques even when they’re not appropriate. It’s useful to consider the differences between infrastructure code and application code.

For example, refactoring application code is usually straightforward: edit the code, compile, and deploy. However, refactoring infrastructure code in an editor is disconnected from the way the changes will be applied to real infrastructure. An IDE gives you control to carefully update references to a class when you change it. Controlling the way the code is applied to change an IaaS resource is a different matter.

Understanding When Code Executes

An application’s code executes after it’s deployed and running. Figure 4-1 shows a basic lifecycle for software code.

Application code is edited by developers, then compiled and built to create a package, and then deployed to its runtime environment. The code that developers write executes as processes that run after it’s deployed, handling requests and events or doing whatever it is that it does.

The basic workflow for infrastructure code is simpler. Developers write and edit code, then deploy it. As shown in Figure 4-2, the code is executed during deployment, not in the runtime environment.

Executing the infrastructure code creates, modifies, or removes infrastructure resources deployed in the IaaS platform. The outcome is the new state of those resources.

This difference may seem obvious, but it has implications for things like testing. If we write a unit test for infrastructure code, does it tell us about the infrastructure our code will create, or does it tell us only about what happens when the infrastructure code is executed? If our code creates networking structures, can we write unit tests that prove that those structures will route traffic the way we expect?

More confusion comes with tools that compile infrastructure code we write in one language into another language. For example, AWS CDK allows developers to write infrastructure code in application development languages like Python and JavaScript, and then compile it to CloudFormation templates. This compiled code is an extra level of abstraction between the code you write and the resources provisioned.

Processing and Deploying Infrastructure Code

The full context of what happens between an engineer writing infrastructure code and having infrastructure resources available for use is more complex than the one-step process shown in Figure 4-2. Infrastructure as Code tools implement code processing and deployment in various ways, and, as discussed in “Build Stage: Preparing for Distribution”, there are different ways to implement infrastructure code processing across a broader delivery workflow.

Although the Deploy step for infrastructure code is typically run as a single command (e.g., terraform apply), several actions happen. Figure 4-3 breaks the deployment into three substeps.

The three substeps of using infrastructure code to provision or change resources are as follows:

1. Assemble: Collate the infrastructure code files and any dependencies like libraries and plug-ins. Some file or code transformations could also happen at this point. The result is a build, which could be files in a temporary working folder or a packaged artifact.
2. Compile: The infrastructure tool compiles and runs the code from the build files to generate the desired state model, a data structure that reflects what the infrastructure resources should look like for the instance the code is being deployed to. Most often the infrastructure tool builds this model in memory, although in some cases it may generate interim files, as when the AWS CDK generates CloudFormation or Terraform code, or Bicep generates JSON Azure Resource Manager (ARM) files.
3. Execute: The infrastructure tool uses the infrastructure platform’s IaaS API to examine the current state of the infrastructure, compares it with the desired state model generated in the Compile substep, and either changes the deployed infrastructure resources to match it or generates a preview report of what it would change.

Although these three substeps may not always be explicit and may not perfectly describe the way a particular infrastructure tool implements the deployment process, they’re a useful framing for how the deployment process works and can help when troubleshooting.

Previewing Changes

Some infrastructure tools have a plan command (Terraform and OpenTofu) or preview command (Pulumi) that compares the desired state generated in the Compile substep with the existing infrastructure resources. These preview commands run the Execute substep of the deployment process, but use only the IaaS API to determine the current state of the resources, not to change them.

An engineer can use a preview to see any changes that will be made, to reassure themselves that they understand what the code will do before they apply it. However, in some situations, applying code can have effects that aren’t revealed by previewing the change.

Tracing Infrastructure Code Execution

The three-step deployment process model helps you understand that tracing, debugging, profiling, and potentially even unit testing infrastructure code may not work the way you expect. The infrastructure code you write, especially with general-purpose languages like TypeScript or Python when using CDK or Pulumi, executes at the Compile substep that generates the desired state model, not at the Execute substep that transforms the infrastructure.

So you might expect a code debugger or profiler to follow each line of infrastructure code and the interaction with the IaaS API and changes to infrastructure resources that the code causes. What’s more likely is that it will first trace the creation of the desired state model, and only later trace the API calls, without clearly correlating the two. For example, a memory profiler won’t tell you how much memory is used by each part of your infrastructure code the way it would for application code.³

Unit tests are useful to make assertions about the desired-state data structures the code generates, but are less useful for asserting how the resources will be modified or will operate afterward. Chapter 17 describes how to use various types of tests most effectively.

It’s especially easy to get muddled when refactoring infrastructure code. When you refactor application code, you build a new executable that you can deploy with only the expected changes to its behavior. Deploying refactored infrastructure code can make unexpected changes to the running infrastructure, such as destroying a storage device with critical data. See Chapter 20 for more on making changes to infrastructure.

Managing Infrastructure State

Infrastructure tools need a way to know which resource defined in code corresponds to which resource instance running on the IaaS platform. This mapping is used to make sure a change to the code is applied to the correct resource. IaaS platforms can handle this internally for their own infrastructure tools. For instance, when you run AWS CloudFormation, you pass an identifier for the stack instance, which the AWS API uses as a reference to an internal data structure that lists the resources belonging to that stack instance.

Tools from third-party vendors (like Terraform, OpenTofu, and Pulumi) need their own data structures to manage these mappings of code to instances. They store these data structures in a state file for each instance. Users of these tools have the option to maintain storage for these state files or to store state data on hosted services.⁴^,⁵

Although having instance state handled transparently by the IaaS platform carries advantages, viewing and even editing state data structures to debug and fix issues can be useful. Chapter 20 discusses techniques for managing changes with state files.

Infrastructure as Code Tools

Systems administrators have always used interpreted languages like Bash, Perl, Ruby, and Python to write scripts and tools to manage their infrastructure. In the 1990s, CFEngine pioneered the use of a declarative, idempotent, DSL for infrastructure management. Puppet and then Chef emerged alongside mainstream server virtualization and the IaaS cloud in the 2000s, both heavily inspired by CFEngine. Ansible , Salt Project , and others followed, also focusing on configuring servers during and after OS installation.

As IaaS cloud platforms emerged with APIs for managing resources, pioneered by AWS, many people reverted to writing procedural code in languages like Ruby and Python. We used SDKs to interact with the IaaS platform API, perhaps using a library like boto or fog.

Stack-oriented tools like Terraform and AWS CloudFormation emerged in the 2010s, using a declarative DSL model similar to that used by server-oriented tools like Puppet and Chef. OpenTofu is a more recent open source fork of Terraform. Azure’s Bicep, Google’s Cloud Deployment Manager , and OpenStack’s Heat are stack-oriented tools that use declarative DSLs for working with those specific IaaS platforms. BOSH is an infrastructure tool from CloudFoundry.

Some more recent stack-oriented tools use general-purpose programming languages rather than a declarative DSL to define infrastructure. Pulumi and AWS CDK support languages like TypeScript, Python, and Java.

Other tools have experimented with different interfaces for infrastructure code, such as IaSQL and StackQL which, as their names suggest, allow you to use SQL to query and manage IaaS resources. As of this writing, these types of tools don’t seem to have much traction, but they’re interesting experiments if nothing else.

Another genre of infrastructure tools uses an approach called infrastructure as data (IaD) for deploying infrastructure code from a Kubernetes cluster. With these tools, infrastructure code is usually written declaratively in YAML and then continually synchronized using a control loop. Examples of these tools include AWS Controllers for Kubernetes (ACK), Crossplane, Config Connector, and Azure Service Operator. Chapter 19 discusses approaches for deploying infrastructure code, including infrastructure as data (see “Infrastructure as Data”).

An emerging class of infrastructure coding is infrastructure from code (IfC), which embeds infrastructure code into application code. Examples of IfC tools include Ampt, Ballerina, Darklang, Klotho, Winglang, and Nitric.

Infrastructure tools support various languages and types of languages, as discussed in the next section. Tools also differ in the IaaS platforms they support. The tools provided by IaaS platform vendors usually support their own platform, like AWS CloudFormation and CDK, and Azure ARM and Bicep. Many tools, including Terraform, OpenTofu, and Pulumi, support multiple IaaS platforms.

Cloud-Portable Infrastructure Code

Most infrastructure coding tools that support multiple IaaS platforms require code to be written for a specific platform. That is, you can write Terraform code for AWS, Azure, Google Cloud, or another platform, but you can’t write code that will run on more than one platform.

Some tools, particularly IfC tools,⁶ allow developers to use cloud-portable abstractions. For example, you can write code to create and use an object_storage resource. The tool then calls the relevant platform-specific code to create an Amazon S3 bucket, Azure Block Blob, or whatever the equivalent resource is for the platform the code is deployed to. For example, the following pseudocode creates a storage bucket and stores the menus for FoodSpin’s customer restaurants in a file for each restaurant:

StorageBucket menuBucket = new cloud.StorageBucket();

for menu in foodspinMenuList {
	menuBucket.store(
		"/restaurants/" + menu.getRestaurantName(),
		menu.getContents()
	)
}

When you use a cloud-portable tool, you need code-abstraction implementations for each abstraction you will use for the cloud platforms you want to run on. If the vendor or third party doesn’t supply the abstractions, you will need to develop your own cloud-specific infrastructure code.

A drawback of cloud-portable abstractions is that they typically support only features common across all cloud platforms. The “lowest common denominator” for cloud resources may limit your ability to get the most value from features provided by your cloud vendors.

Types of Languages for Coding Infrastructure

Apart from which IaaS platforms they support, Infrastructure as Code tools are distinguished by the types of languages they use for defining infrastructure. Many people who work with infrastructure have strong opinions about which languages are best for working with infrastructure. The reality is that the best language will vary for different systems, for different parts of a system, and for different people. For example, a software developer may prefer to use a general-purpose, imperative language like TypeScript to define runtime infrastructure at a high level of abstraction. Systems administrators often prefer declarative DSLs that allow them to specify infrastructure at a low level.

These four groups of opposing characteristics provide a lot of context for considering which tool is right for your system and your team: procedural versus idempotent, declarative versus imperative, general-purpose versus DSL, and low-level versus high-level. Keep in mind that the right choice may be to use different languages for different parts of your system rather than trying to force one tool and language to solve all your use cases.

Procedural and Idempotent Code

Many task-focused scripts are written to be run only when an action needs to be carried out or a specific change made, but can’t be safely run multiple times. For example, this script creates a virtual server by using a fictional infrastructure tool called stack-tool:⁷

stack-tool create-virtual-server \
  --name=my-server \
  --os_image=ubuntu-22.10 \
  --memory=4GB \
  --disk=80GB

If you run this script once, you get one new server. If you run it three times, you get three new servers.⁸

We can change the script to check whether the server name exists and refuse to create a new one if so. The following snippet runs the fictional stack-tool find-virtual-server command and creates a new server only if the code exits with a value of 1:

stack-tool find-virtual-server --name=my-server
if [ "$?" = "1" ] ; then
  echo "Creating a new server"
  stack-tool create-virtual-server \
    --name=my-server \
    --os_image=ubuntu-22.10 \
    --memory=4GB \
    --disk=80GB
else
  echo "Server already exists, not creating a new one"
fi

The script is now idempotent. No matter how many times we run it, the result is the same; a single server named my-server. If we configure an automated process to run this script continually, we can be sure the server exists. If the server doesn’t already exist, the process will create it. If someone destroys the server, or if it crashes, the process will restore it.

But what happens if we decide the server needs more memory? We can edit the script to change the argument --memory=4GB to --memory=8GB. But if the server already exists with 4 GB of RAM, the script won’t change it. We could add a new check, taking advantage of convenient options available in the imaginary stack-tool:

stack-tool find-virtual-server --name=my-server --print-id > saved-server-id
if [ "$?" = "1" ] ; then
  echo "Creating a new server"
  stack-tool create-virtual-server \
    --name=my-server \
    --os_image=ubuntu-22.10 \
    --memory=8GB \
    --disk=80GB
else
  echo "Server already exists, not creating a new one"
fi
CURRENT_RAM=`stack-tool get-virtual-server-info \
  --id=$(cat saved-server-id) \
  --attribute=memory`
if [ "${CURRENT_RAM}" != "8GB" ] ; then
  stack-tool modify-virtual-server \
    --id=$(cat saved-server-id) \
    --memory=8GB
fi

The modified script now captures the ID of the existing virtual server, if found, into a file. It then passes the ID to stack-tool with a command to change the existing server’s memory allocation to 8 GB. I’ve also changed the script so that if the server doesn’t already exist, the command to create a new one creates it with the right memory setting.

Now we have an idempotent script that ensures the server exists with the right amount of memory. However, scripts like this become messy over time as needs change and more conditionals are added.

An alternative would be to write a new, separate script to change the memory of the existing server. Each script is simpler than the one that combines creating and modifying the server. However, you then get a proliferation of scripts.

Writing task-focused scripts comes naturally for systems administrators who maintain systems by carrying out tasks as needed. But the approach depends on humans to make decisions about which scripts to run and when, depending on their knowledge of the current state of the system.

The core premise of Infrastructure as Code is that humans can put their knowledge and decision making into the code ahead of time, so they can stay hands-off and leave it to the machines to carry out the work.⁹ So most modern Infrastructure as Code languages work idempotently, even the imperative ones.

Imperative and Declarative Languages and Tools

Imperative code is a set of instructions that specifies how to make a certain action happen. Declarative code specifies what you want, without specifying how to make it happen. Most popular programming languages are imperative, including procedural languages like Python and object-oriented languages like Java. Configuration languages like YAML and XML are declarative,¹⁰ as are set-based languages like SQL and functional languages like Clojure.

This example uses declarative code to create the same virtual server instance as the earlier examples:

virtual_server:
  name: my_server
  os_image: ubuntu-22.10
  memory: 8GB

This code doesn’t include any logic to check whether the server already exists or, if it does exist, how much memory it currently has. The infrastructure tool takes care of that when it applies the code to the deployed infrastructure. If you edit the code and change the amount of memory, then run it again, the tool will (probably) modify the existing instance rather than create a new one.¹¹

The benefit of declarative infrastructure code is its simplicity and clarity. Many infrastructure code tools, including Ansible, Bicep, CloudFormation, Puppet, OpenTofu, and Terraform use declarative languages.

Other tools, including CDK, Pulumi, and most IfC tools, support imperative languages. (Pulumi also supports at least one declarative language, YAML). It’s important to note, however, that most of these tools use a declarative model. The procedural or object-oriented infrastructure logic creates data structures that model the desired state of the infrastructure, which the tool then applies to create or modify the infrastructure resources running in the platform.

This code example creates a virtual server, using a conditional statement to set the memory size:

appserverMemory = switch (serverSizeParameter) {
	"S": "1GB"
	"M": "4GB"
	"L": "8GB"
	"XL": "16GB"
}
VirtualServer appServer = VirtualServer.new (
	name: "my_server",
	os_image: "ubuntu-22",
	memory: appserverMemory
)

The code in this example is idempotent, without needing to explicitly implement any checks to see whether the instance already exists and has the specified amount of memory. The code runs in the Execute step of the infrastructure code processing model described earlier. The tool then executes the Apply step, interacting with the IaaS API to take whatever actions are needed to create or change the VM.

Although it uses an imperative language, this example is trivial to implement in a declarative language. The HashiCorp Configuration Language (HCL) used by Terraform and OpenTofu includes expressions, sets, and other constructs that offer enough flexibility for most situations. See the HCL native syntax specification and the Terraform documentation for more.

Declarative languages get messier when used to write more-complex components, and especially when used to create configurable components. On the other hand, using a fully imperative language for straightforward infrastructure definitions often creates code that is unnecessarily verbose. This is why I recommend not trying to use a single language across a large, complex infrastructure estate.

Domain-Specific and General-Purpose Languages

Many infrastructure tools use their own DSL.¹² A DSL is a language designed to model a specific domain, in our case infrastructure. This makes it easier to write code and makes the code easier to understand because it closely maps the things you’re defining.

Ansible, Chef, and Puppet each have a DSL for configuring servers. Their languages provide constructs for concepts like packages, files, services, and user accounts. Here is a pseudocode example of a server configuration DSL:

package: jdk
package: tomcat

service: tomcat
  port: 8443
  user: tomcat
  group: tomcat

file: /var/lib/tomcat/server.conf
  owner: tomcat
  group: tomcat
  mode: 0644
  contents: $TEMPLATE(/src/appserver/tomcat/server.conf.template)

This code ensures that two software packages are installed, jdk and tomcat. It defines a service that should be running, including the port it listens to and the user and group it should run as. Finally, the code specifies that a server configuration file should be created using the template file server.conf.template. The example code is pretty easy for someone with systems administration knowledge to understand, even if they don’t know the specific tool or language.

Many IaaS infrastructure tools also use DSLs, including Terraform, OpenTofu, and CloudFormation. These DSLs model the IaaS resources so that you can write code that refers to virtual servers, disk volumes, and network routes.

Many infrastructure DSLs are internal DSLs written as a subset (or superset) of a general-purpose programming language. Chef is an example of an internal DSL written as Ruby code, allowing access to the fully Ruby language and libraries. Others are external DSLs, which are interpreted by code written in a different language. Terraform HCL is an external DSL not related to the Go language its interpreter is written in. Puppet, like Chef, is written in Ruby, but Puppet code isn’t Ruby code and can’t use general-purpose features of Ruby.

Some infrastructure tools, notably AWS CDK and Pulumi, enable you to write infrastructure code by using a general-purpose language (GPL). GPLs can be used across various domains and problem spaces. Developers can choose a language they’re familiar with from developing software, such as JavaScript, TypeScript, Python, Go, or C#. These languages have a large ecosystem of tools with mature support, including IDEs, linters, and unit-testing frameworks.

Many GPLs are imperative, although, as mentioned earlier, they’re used to build a declarative model to apply to infrastructure. Some infrastructure tools use general-purpose markup languages like YAML (Ansible, CloudFormation, anything related to Kubernetes) and JSON (Packer, CloudFormation). Many third-party tools are available for working with these markup languages. But because declarative languages lack type systems, the tools for working with them aren’t as powerful as those for imperative languages.

Low-Level and High-Level Languages

Most Infrastructure as Code languages directly model the resources they configure. The languages used with tools like Terraform and Pulumi are essentially thin wrappers over IaaS APIs. For example, the Terraform aws_instance provider directly maps to the AWS API run_instances API method.¹³

Many teams use infrastructure code languages to build their own abstraction layers via libraries or modules. Chapter 10 describes several patterns and antipatterns for writing libraries or modules to wrap the gritty details of, for example, wiring up network routes.

A few tools support higher-level abstractions for infrastructure, often as components at the level of libraries, deployable components, or compositions. Chapter 6 discusses components for infrastructure code at various levels.

Infrastructure from Code

Infrastructure from Code , as mentioned earlier, mixes infrastructure code within application code. Some implementations provide support for including infrastructure code with existing general-purpose software development languages, while others introduce a new language, often with dedicated development and testing environments. As of this writing, IfC is not a mainstream approach.

I generally recommend separating concerns between applications and infrastructure, which IfC tools may contradict. For example, imagine a developer writing code to save a new customer’s registration information in a database. Code that handles both the registration workflow as well as details of provisioning and configuring the database to store it would be difficult to understand and maintain.

However, the intent of these languages is not to intermix code at the level normally written in Terraform with business logic. Instead, business logic code can specify the relevant attributes of infrastructure at the right level of detail for the context. The user registration code can specify that the data should be saved to a database configured for handling personal customer information. The code calls a separately written library that handles the details of provisioning and configuring the database appropriately.

So the system can be designed to separate the concerns of business logic and detailed infrastructure configuration, while explicitly managing those concerns that are relevant across those boundaries. The current paradigm of defining and deploying infrastructure separately relies on out-of-band knowledge to know that the database needs to be configured for personal data. This approach also involves brittle integration points that we need to configure explicitly on both sides, such as connectivity and authentication.

Integrating application and infrastructure development means we can redraw the boundaries of applications vertically, aligning infrastructure with the logic it supports, rather than horizontally.¹⁴

Post-Code Infrastructure Automation

Many organizations are running into limitations of the Infrastructure as Code model, at least as practiced today.¹⁵ Many experiments and innovations are aimed at moving beyond the current code-oriented model.

The basic premise is often to represent the current and desired state of infrastructure in a data model or graph. Engineers, users, and executable code may be able to interact with the model in different ways, including using “traditional” Infrastructure as Code tools, a GUI, API, or perhaps other modes still to be explored. System Initiative and ConfigHub are developing tools along these lines.¹⁶

At first glance these solutions look similar to the ClickOps approach that I disparaged at the start of this chapter. However, extensible implementations have the potential to handle many of the limitations of ClickOps. For example, users can use the interactive interface to prepare a change set and carry out checks and approvals before applying it to the provisioned resources. This suggests we would be able to implement tests and other validations, support multiple people working on a system concurrently, and potentially replicate changes consistently across environments.

Having a dynamic model that is updated from the real infrastructure in real time would shrink the feedback loop for working on changes. While working on a potential change, an engineer can refresh the model to see new changes to the live system and how they will impact their work.

Post-code infrastructure automation solutions are still unproven, essentially experiments. But it’s heartening to see people exploring ways to advance our ways of working. It’s important to be aware that the current state of code-based infrastructure management approaches and tools is only a step on a journey.

Conclusion

The topics in this chapter could be considered meta concepts for Infrastructure as Code. However, it’s important to keep in mind that the goal of making routine tasks hands-off for team members is what makes Infrastructure as Code more powerful than writing scripts to automate tasks within a hands-on workflow. Considering how language attributes affect the maintainability of infrastructure code helps you select more useful tools for different jobs in your system.

This chapter wraps up the foundational content of the book (Part I). The following chapters discuss the architecture and design of building infrastructure systems as code (Part II).

¹ The term infrastructure stacks is defined in Chapter 6.

² The Jenkins CI server’s Pipelines as Code feature is a prominent example of configuration as code.

³ A team I worked with made this discovery when trying to understand how to optimize Terraform project code that was leading to out-of-memory errors when running apply. The solution they came to wasn’t to optimize the code but to break it into smaller projects.

⁴ The Terraform license may discourage vendors from providing services such as storing state files to users of its tools.

⁵ Many of the Terraform Automation and Collaboration Software (TACOS) services described in Chapter 19 can manage state files. See “Using an Infrastructure Code Deployment Service”.

⁶ Winglang and Nitric are the main examples I know about as of late 2024. I’m sure there are others.

⁷ I use fictional tools, languages, and platforms throughout this book. Imaginary tools are nice because their features and syntax work exactly the way I need for any example.

⁸ You might hope that the --name=my-server argument will prevent the tool from creating multiple copies of the server. However, most IaaS platforms don’t treat user-supplied tags or names as unique identifiers. So this example creates three servers with the same name.

⁹ Other than when things go wrong.

¹⁰ Kids today complain about how terrible YAML is, but I remember the days when XML was everywhere. Writing infrastructure definitions in Ansible’s YAML-based syntax is a luxury compared with writing Ant build scripts in XML.

¹¹ Some changes to an existing infrastructure resource will cause it to be destroyed and a new instance built with the changed value, depending on how the IaaS API works for the particular instance.

¹² Martin Fowler and Rebecca Parsons define a DSL as a “small language, focused on a particular aspect of a software system” in their book Domain-Specific Languages (Addison-Wesley).

¹³ You can see this in the documentation for the Terraform aws_instance resource and the AWS RunInstances API method.

¹⁴ See Gregor Hohpe’s article “IxC: Infrastructure as Code, from Code, with Code”, for an exploration of various combinations of infrastructure, architecture, and code and their implications.

¹⁵ I recommend reading through Brian Grant’s “Infrastructure as Code and Declarative Configuration” series of posts for a thorough critique.

¹⁶ As of early 2025, System Initiative has released its services as General Availability, while ConfigHub is still in early development.