Feature Flags

AUDIENCE

Programmers

We deploy and release independently.

For many teams, releasing their software is the same as deploying their software. They deploy a branch of their code repository into production, and everything in that branch is released. If there’s anything they don’t want to release, they store it in a separate branch.

That doesn’t work for teams using continuous integration and deployment. Other than short-lived development branches, they have only one branch: their integration branch. There’s nowhere for them to hide unfinished work.

Feature flags, also known as feature toggles, solve this problem. They hide code programmatically, rather than using repository branches. This allows teams to deploy unfinished code without releasing it.

Feature flags can be programmed in a variety of ways. Some can be controlled at runtime, allowing people to release new features and capabilities without redeploying the software. This puts releases in the hands of business stakeholders, rather than programmers. They can even be set up to release the software in waves, or to limit releases to certain types of users.

Keystones

Strictly speaking, the simplest type of feature flag isn’t a feature flag at all. Kent Beck calls it a “keystone.” [Beck2004] (ch. 9) It’s easy: when working on something new, wire up the UI last. That’s the keystone. Until the keystone is in place—until the UI is wired up—nobody will know the new code exists, because they won’t have any way to access it.

For example, when I migrated a website to use a different authentication service, I started by implementing an infrastructure wrapper for the new service. I was able to do most of that work without wiring it up to the login button. Until I did, users were unaware of the change, because the login button still used the old authentication infrastructure.

This does raise the question: if you can’t see your changes, how do you test them? The answer is test-driven development and narrow tests. Test-driven development allows you to check your work without seeing it run. Narrow tests target specific functions without requiring them to be hooked up to the rest of your application.

Eventually, of course, you’ll want to see the code run, either to fine-tune the UI (which can be difficult to test-drive), for customer review, or just to double-check your work. TDD isn’t perfect, after all.

Design your new code to be “wired up” with a single line. When you want to see it run, add that line. If you need to integrate before you’re ready to release, comment that line out. When you’re ready to release, write the appropriate test and uncomment the line one final time.

Keystones don’t have to involve a user interface. Anything that hides your work from customers can be used as a keystone. For example, one team used continuous deployment for a rewrite of its website. The team deployed the new site to a real production server, but the server didn’t receive any production traffic. Nobody outside the company could see the new site until the team switched production traffic from the old server to the new one.

Keystones are my preferred approach to hiding incomplete work. They’re simple, straightforward, and don’t require any special maintenance or design work.

Feature Flags

Feature flags are just like keystones, except they use code to control visibility, not a comment. Usually, it’s a simple if statement.

To continue the authentication example, remember that I programmed my new authentication infrastructure without wiring it up to the login button. Before I could wire it up, I needed to test it in production because there were complicated interactions between the third-party service and my systems. But I didn’t want my users to use the new login before I tested it.

I solved this dilemma with a feature flag. My users saw the old login; I saw the new one. The code worked like this (Node.js):

if (siteContext.useAuth0ForAuthentication()) {
  // new Auth0 HTML
}
else {
  // old Persona HTML
}

As part of the change, I had to implement a new email validation page. It wasn’t exposed to existing users, but it was still possible for people to manually type in the URL, so I also used the feature flag to redirect them away:

httpGet(siteContext) {
  if (!siteContext.useAuth0ForAuthentication()) return redirectToAccountPage();
  
}

Feature flags are real code. They need the same attention to quality as the rest of your code. For example, the email validation page had this test:

it("redirects to account page if Auth0 feature flag is off", function() {
  const siteContext = createContext({ auth0: false });
  const response = httpGet(siteContext);
  assertRedirects(response, "/v3/account"));
});

Be sure to remove feature flags after they’re no longer needed. This can be easy to forget, which is one of the reasons I prefer keystones to feature flags. To help you remember, you can add a reminder to your team calendar or a “remove flag” story to your team’s plan. Some teams program their flag code to log an alert or fail tests after its expiration date has passed.

How does your code know when the flag is enabled? In other words, where do you implement your equivalent of useAuth0ForAuthentication()? You have several options.

Prerequisites

Anybody can use keystones. Feature flags run the risk of growing out of control, so your team needs to pay attention to their design and removal, especially as they multiply. Collective code ownership and reflective design help.

Despite their superficial similarity to privileges that control user access to features, feature flags are meant to be temporary. Don’t use feature flags as a replacement for proper user access control.

Alternatives and Experiments

Feature branches are a common alternative to keystones and feature flags. When someone starts working on a new feature, they create a branch, and they don’t merge that branch back into the rest of the code until the feature is done. This is effective at keeping unfinished changes out of the hands of customers, but significant refactorings tend to cause merge conflicts. This makes it a poor choice for Delivering teams, which rely on refactoring and reflective design to keep costs low.

Keystones are so simple, they don’t leave a lot of room for experimentation. Feature flags, on the other hand, are ripe for exploration. Look for ways to keep your feature flags organized and the design clean. Consider how your flags can provide new business capabilities. For example, feature flags are often used for A/B testing, which involves showing different versions of your software to different users, then making decisions based on the results.

As you experiment, remember that simpler is better. Although keystones may seem like a cheap trick, they’re very effective, and they keep the code clean. It’s easy for feature flags to get out of control. Stick with simple solutions whenever you can.