Ten years ago, a CIO at a global bank scoffed when I suggested they look into private cloud technologies and infrastructure automation tooling: “That kind of thing might be fine for startups, but we’re too large and our requirements are too complex.” Even a few years ago, many enterprises considered using public clouds to be out of the question.
These days cloud technology is pervasive. Even the largest, most hidebound organizations are rapidly adopting a “cloud-first” strategy. Those organizations that find themselves unable to consider public clouds are adopting dynamically provisioned infrastructure platforms in their data centers.1 The capabilities that these platforms offer are evolving and improving so quickly that it’s hard to ignore them without risking obsolescence.
Cloud and automation technologies remove barriers to making changes to production systems, and this creates new challenges. While most organizations want to speed up their pace of change, they can’t afford to ignore risks and the need for governance. Traditional processes and techniques for changing infrastructure safely are not designed to cope with a rapid pace of change. These ways of working tend to throttle the benefits of modern, Cloud Age technologies—slowing work down and harming stability.2
In Chapter 1 I use the terms “Iron Age” and “Cloud Age” (“From the Iron Age to the Cloud Age”) to describe the different philosophies that apply to managing physical infrastructure, where mistakes are slow and costly to correct, and managing virtual infrastructure, where mistakes can be quickly detected and fixed.
Infrastructure as Code tools create the opportunity to work in ways that help you to deliver changes more frequently, quickly, and reliably, improving the overall quality of your systems. But the benefits don’t come from the tools themselves. They come from how you use them. The trick is to leverage the technology to embed quality, reliability, and compliance into the process of making changes.
I wrote the first edition of this book because I didn’t see a cohesive collection of guidance on how to manage Infrastructure as Code. There was plenty of advice scattered across blog posts, conference talks, and documentation for products and projects. But a practitioner needed to sift through everything and piece a strategy together for themselves, and most people simply didn’t have time.
The experience of writing the first edition was amazing. It gave me the opportunity to travel and to talk with people around the world about their own experiences. These conversations gave me new insights and exposed me to new challenges. I learned that the value of writing a book, speaking at conferences, and consulting with clients is that it fosters conversations. As an industry, we are still gathering, sharing, and evolving our ideas for managing Infrastructure as Code.
Things have moved along since the first edition came out in June 2016. That edition was subtitled “Managing Servers in the Cloud,” which reflected the fact that most infrastructure automation until that point had been focused on configuring servers. Since then, containers and clusters have become a much bigger deal, and the infrastructure action has moved to managing collections of infrastructure resources provisioned from cloud platforms—what I call stacks in this book.
As a result, this edition involves more coverage of building stacks, which is the remit of tools like CloudFormation and Terraform. The view I’ve taken is that we use stack management tools to assemble collections of infrastructure that provide application runtime environments. Those runtime environments may include servers, clusters, and serverless execution environments.
I’ve changed quite a bit based on what I’ve learned about the evolving challenges and needs of teams building infrastructure. As I’ve already touched on in this preface, I see making it safe and easy to change infrastructure as the key benefit of Infrastructure as Code. I believe people underestimate the importance of this by thinking that infrastructure is something you build and forget.
But too many teams I meet struggle to meet the needs of their organizations; they are not able to expand and scale quickly enough, support the pace of software delivery, or provide the reliability and security expected. And when we dig into the details of their challenges, it’s that they are overwhelmed by the need to update, fix, and improve their systems. So I’ve doubled down on this as the core theme of this book.
This edition introduces three core practices for using Infrastructure as Code to make changes safely and easily:
This one is obvious from the name, and creates repeatability and consistency.
Each change enhances safety. It also makes it possible to move faster and with more confidence.
These are easier and safer to change than larger pieces.
These three practices are mutually reinforcing. Code is easy to track, version, and deliver across the stages of a change management process. It’s easier to continuously test smaller pieces. Continuously testing each piece on its own forces you to keep a loosely coupled design.
These practices and the details of how to apply them are familiar from the world of software development. I drew on Agile software engineering and delivery practices for the first edition of the book. For this edition, I’ve also drawn on rules and practices for effective design.
In the past few years, I’ve seen teams struggle with larger and more complicated infrastructure systems, and I’ve seen the benefits of applying lessons learned in software design patterns and principles, so I’ve included several chapters in this book on how to do this.
I’ve also seen that organizing and working with infrastructure code is difficult for many teams, so I’ve addressed various pain points. I describe how to keep codebases well organized, how to provide development and test instances for infrastructure, and how to manage the collaboration of multiple people, including those responsible for governance.
I don’t believe we’ve matured as an industry in how we manage infrastructure. I’m hoping this book gives a decent view of what teams are finding effective these days. And a bit of aspiration of what we can do better.
I fully expect that in another five years the toolchains and approaches will evolve. We could see more general-purpose languages used to build libraries, and we could be dynamically generating infrastructure rather than defining the static details of environments at a low level. We certainly need to get better at managing changes to live infrastructure. Most teams I know are scared when applying code to live infrastructure. (One team referred to Terraform as “Terrorform,” but users of other tools all feel this way.)
The thesis of this book is that exploring different ways of using tools to implement infrastructure can help us to improve the quality of services we provide. We aim to use speed and frequency of delivery to improve the reliability and quality of what we deliver.
So the focus of this book is less on specific tools, and more on how to use them.
Although I mention examples of tools for particular functions like configuring servers and provisioning stacks, you won’t find details of how to use a particular tool or cloud platform. You will find patterns, practices, and techniques that should be relevant to whatever tools and platforms you use.
You won’t find code examples for real-world tools or clouds. Tools change too quickly in this field to keep code examples accurate, but the advice in this book should age more slowly, and be applicable across tools. Instead, I write pseudocode examples for fictional tools to illustrate concepts. See the book’s companion website for references to example projects and code.
This book won’t guide you on how to use the Linux operating system, Kubernetes cluster configuration, or network routing. The scope of this book does include ways to provision infrastructure resources to create these things, and how to use code to deliver them. I share different cluster topology patterns and approaches for defining and managing clusters as code. I describe patterns for provisioning, configuring, and changing server instances using code.
You should supplement the practices in this book with resources on the specific operating systems, clustering technologies, and cloud platforms. Again, this book explains approaches for using these tools and technologies that are relevant regardless of the particular tool.
This book is also light on operability topics like monitoring and observability, log aggregation, identity management, and other concerns that you need to support services in a cloud environment. What’s in here should help you to manage the infrastructure needed for these services as code, but the details of the specific services are, again, something you’ll find in more specific resources.
Infrastructure as Code tools and practices emerged well before the term. Systems administrators have been using scripts to help them manage systems since the beginning. Mark Burgess created the pioneering CFEngine system in 1993. I first learned practices for using code to fully automate provisioning and updates of servers from the Infrastructures.org website in the early 2000s.3
Infrastructure as Code has grown along with the DevOps movement. Andrew Clay-Shafer and Patrick Debois triggered the DevOps movement with a talk at the Agile 2008 conference. The first uses I’ve found for the term “Infrastructure as Code” are from a talk called “Agile Infrastructure” that Clay-Shafer gave at the Velocity conference in 2009, and an article John Willis wrote summarizing the talk. Adam Jacob, who cofounded Chef, and Luke Kanies, founder of Puppet, were also using the phrase around this time.
This book is for people who are involved in providing and using infrastructure to deliver and run software. You may have a background in systems and infrastructure, or in software development and delivery. Your role may be engineering, testing, architecture, or management. I’m assuming you have some exposure to cloud or virtualized infrastructure and tools for automating infrastructure using code.
Readers new to Infrastructure as Code should find this book a good introduction to the topic, although you will get the most out of it if you are familiar with how infrastructure cloud platforms work, and the basics of at least one infrastructure coding tool.
Those who have more experience working with these tools should find a mixture of familiar and new concepts and approaches. The content should create a common language and articulate challenges and solutions in ways that experienced practitioners and teams find useful.
I use the terms principles, practices, and patterns (and antipatterns) to describe essential concepts. Here are the ways I use each of these terms:
A principle is a rule that helps you to choose between potential solutions.
A practice is a way of implementing something. A given practice is not always the only way to do something, and may not even be the best way to do it for a particular situation. You should use principles to guide you in choosing the most appropriate practice for a given situation.
A pattern is a potential solution to a problem. It’s very similar to a practice in that different patterns may be more effective in different contexts. Each pattern is described in a format that should help you to evaluate how relevant it is for your problem.
An antipattern is a potential solution that you should avoid in most situations. Usually, it’s either something that seems like a good idea or else it’s something that you fall into doing without realizing it.
I use a fictional company called ShopSpinner to illustrate concepts throughout this book. ShopSpinner builds and runs online stores for its customers.
ShopSpinner runs on FCS, the Fictional Cloud Service, a public IaaS provider with services that include FSI (Fictional Server Images) and FKS (Fictional Kubernetes Service). It uses the Stackmaker tool—an analog of Terraform, CloudFormation, and Pulumi—to define and manage infrastructure on its cloud. It configures servers with the Servermaker tool, which is much like Ansible, Chef, or Puppet.
ShopSpinner’s infrastructure and system design may vary depending on the point I’m using it to make, as will the syntax of the code and command-line arguments for its fictional tools.
The following typographical conventions are used in this book:
Indicates new terms, URLs, email addresses, filenames, and file extensions.
Constant widthUsed for program listings, as well as within paragraphs to refer to program elements such as variable or function names, databases, data types, environment variables, statements, and keywords.
Constant width boldShows commands or other text that should be typed literally by the user.
Constant width italicShows text that should be replaced with user-supplied values or by values determined by context.
This element signifies a tip or suggestion.
This element signifies a general note.
This element indicates a warning or caution.
For more than 40 years, O’Reilly Media has provided technology and business training, knowledge, and insight to help companies succeed.
Our unique network of experts and innovators share their knowledge and expertise through books, articles, and our online learning platform. O’Reilly’s online learning platform gives you on-demand access to live training courses, in-depth learning paths, interactive coding environments, and a vast collection of text and video from O’Reilly and 200+ other publishers. For more information, visit http://oreilly.com.
Please address comments and questions concerning this book to the publisher:
We have a web page for this book, where we list errata, examples, and any additional information. You can access this page at https://oreil.ly/infra-as-code-2e.
Email bookquestions@oreilly.com to comment or ask technical questions about this book.
For news and information about our books and courses, visit http://oreilly.com.
Find us on Facebook: http://facebook.com/oreilly
Follow us on Twitter: http://twitter.com/oreillymedia
Watch us on YouTube: http://www.youtube.com/oreillymedia
As with the first edition, this book is not my product—it’s the result of collating and consolidating as best I could what I’ve learned from more people than I’m able to remember and properly credit. Apologies and thanks to those I’ve forgotten to name here.
I always enjoy knocking ideas around with James Lewis; our conversations and his writing and talks have directly and indirectly influenced much of what’s in this book. He kindly shared his deep experience on software design and broad awareness of various other topics by giving me thoughts on a near-final draft of the book. His suggestions helped me to tighten up the connections I’ve tried to draw between software engineering and Infrastructure as Code.
Martin Fowler has generously supported my efforts from the beginning. His skill at drawing from various people’s experiences, applying his knowledge and perception, and shaping it all into clear, helpful advice inspires me.
Thierry de Pauw has been a most thoughtful and helpful reviewer. He read multiple drafts and shared his reactions, telling me what he found new and useful, which ideas aligned with his own experiences, and which parts didn’t come across clearly to him.
I need to thank Abigail Bangser, Jon Barber, Max Griffiths, Anne Simmons, and Claire Walkley for their encouragement and inspiration.
People I’ve worked with gave me thoughts and ideas that improved this book. James Green shared insights into data engineering and machine learning in the context of infrastructure. Pat Downey explained his use of expand and contract for infrastructure. Vincenzo Fabrizi pointed out to me the value of inversion of control for infrastructure dependencies. Effy Elden is an endless fount of knowledge about the infrastructure tooling landscape. Moritz Heiber directly and indirectly influenced the contents of this book, although it would be too much to hope that he agrees with 100% of it.
At ThoughtWorks I have the opportunity to interact with and discuss Infrastructure as Code and related topics with many colleagues and clients in workshops, projects, and online forums. A few of these people include Ama Asare, Nilakhya Chatterjee, Audrey Conceicao, Patrick Dale, Dhaval Doshi, Filip Fafara, Adam Fahie, John Feminella, Mario Fernandez, Louise Franklin, Heiko Gerin, Jarrad “Barry” Goodwin, Emily Gorcenski, James Gregory, Col Harris, Prince M Jain, Andrew Jones, Aiko Klostermann, Charles Korn, Vishwas Kumar, Punit Lad, Suya Liu, Tom Clement Oketch, Gerald Schmidt, Boss Supanat Pothivarakorn, Rodrigo Rech, Florian Sellmayr, Vladimir Sneblic, Isha Soni, Widyasari Stella, Paul Valla, Srikanth Venugopalan, Ankit Wal, Paul Yeoh, and Jiayu Yi. Also thanks to Kent Spillner—this time I remember why.
Plenty of people reviewed various drafts of this edition of the book and shared feedback, including Artashes Arabajyan, Albert Attard, Simon Bisson, Phillip Campbell, Mario Cecchi, Carlos Conde, Bamdad Dashtban, Marc Hofer, Willem van Ketwich, Barry O’Reilly, Rob Park, Robert Quinlivan, Wasin Watthanasrisong, and Rebecca Wirfs-Brock.
Deep thanks to Virginia Wilson, my editor, who has kept me going throughout the long and grueling process of making this book happen. My colleague, John Amalanathan, turned my wonky diagrams into the slick artwork you see here, with great patience and diligence.
My employer, ThoughtWorks, has been an enormous supporter. Firstly by creating the environment for me to learn from phenomenal people, secondly by fostering a culture that encourages its members to share ideas with the industry, and thirdly, by supporting me as I worked with other ThoughtWorkers and clients to explore and test new ways of working. Ashok Subramanian, Ruth Harrison, Renee Hawkins, Ken Mugrage, Rebecca Parsons, and Gayathri Rao, among others, have helped me make this more than a personal project.
Last and most, everlasting love to Ozlem and Erel, who endured my obsession with this book. Again.
1 For example, many government and financial organizations in countries without a cloud presence are prevented by law from hosting data or transactions abroad.
2 The research published by DORA in the State of DevOps Report finds that heavyweight change-management processes correlate to poor performance on change failure rates and other measures of software delivery effectiveness.
3 The original content remains on this site as of summer 2020, although it hadn’t been updated since 2007.