Testing a feature-flagged change

February 9, 2020

“Sorry, that just sounds… dangerous”.

I sometimes hear things like this when explaining concepts like Trunk-based Development and Feature Flagging to someone who’s not worked that way before. The testers on a team are often the first to speak up.

We need some way to verify the behavior of a feature flag, without actually flipping it on and exposing our end-users to half-finished, untested code.

I’m all for healthy skepticism, and when it comes to testing and feature flags some misgivings are justified. A professional tester might well be a little uncomfortable with the idea of engineers knowingly merging half-finished code into master, and then pushing that code to production(!!!). Feature flagging makes this crazy idea feasible - we protect users from that half-finished feature by placing it behind a flag. But, how can we be sure that a flag is working as intended? We need some way to verify the behavior of a feature flag, without actually flipping it on and exposing our end-users to half-finished, untested code.

In this article we’ll learn how to do just that - test a feature flag change without impacting end-users - by exploring these different strategies:

Turn the feature on in a pre-production environment
Turn the feature on in production, but only for testers
Override the state of a flag for testing

A worked example - free shipping

We’ll explore these options in more detail using a hypothetical example. You’re a tester working at an online retailer, and your team is getting ready to release a new feature - free shipping for all purchases over $100. This feature turns out to be quite complex to implement, and it’s been under development for a few weeks. Your team strive to avoid long-lived feature branches whenever possible, so they’ve been using trunk-based development techniques - implementing the feature as a series of incremental changes on the team’s shared master branch, protected by a feature flag.

At this point the engineers believe the feature is complete, and ready for release. You’d like to test it first, by flipping the free-shipping-for-big-purchases feature flag on. However, you don’t want to flip that flag on for end users until you’ve had a chance to test the feature. What are your options?

Pre-production environments

One obvious way to test a change without impacting users is to make that change outside of your production environment. This shouldn’t sound like a revolutionary concept - it’s the way almost all changes are tested when you’re not using feature flags.

We can use this same approach for testing the effect of a feature flag itself. We can take a pre-production environment running the latest production deploy (or close to it), turn our free-shipping-for-big-purchases feature flag on, and then test away. We might use a shared pre-production environment for this, or stand up a dedicated personal environment.

Working outside of production means you can be confident that you won’t expose your users to half-finished work. However, a non-production environment also imposes a large overhead. There’s the general cost of running an additional environment, keeping deployments up to date, managing data, and so on. Additionally, since you’re using the environment to test feature flag changes you also need to ensure that the flag configuration for that environment tracks what’s in production. Another thing to keep in mind is that making feature flag changes in a shared pre-production environment could end up impacting other users of the environment, who might be confused by half-finished features being turned on.

Testing in production

Feature flagging allows you to test a change directly in your production environment, without exposing it to all users. We can do this a few different ways.

Targetted release

Many feature-flagging systems allow you to turn a feature on for a specific user, or group of users. We can use this capability to turn the free-shipping-for-big-purchases feature on in production, but only for ourselves, and then test the functionality. This is sort of like a Canary Release, but targeted just at internal testers.

This sort of targetted release is a popular approach for testing a feature prior to launch, and it’s a good option, as long as your feature-flagging system supports it. However there are a few scenarios where it’s not useful. This technique doesn’t apply if you need to test a feature prior to login, as an anonymous user. It also won’t help when you want to validate both sides of an A/B test, unless your feature-flagging system allows you to force a user into a specific bucket (most don’t).

Flag overrides

Some feature-flagging systems provide the ability to temporarily override the state of a flag, by setting a magical query parameter or a cookie when making a web request. For a mobile app or web SPA a user can request the override using a hidden dev screen¹.

If we have this capability we can use it for testing. We’d turn the free shipping feature on for ourselves by overriding the free-shipping-for-big-purchases flag, then test the feature.

These types of flag overrides are particularly useful when working with marketing pages and “top of funnel” signup flows - areas that often make heavy use of A/B testing and where most users are anonymous.

However, this type of flag override system can be quite invasive to implement, since you typically need to hook into your application’s HTTP request context, or implement a custom dev-only UI. It can also be a little harder to track what the current state of a feature flag i when there’s the additional possibility of a local override. When it’s feasible to do so, the user-targeting capabilities of your feature flagging system is a better option.

So, what should we use?

Each of the approaches I’ve outlined has pros and cons. On balance, I think that testing features in production via a targeted release is usually a good bet, assuming your feature-flagging system supports it. Having the option for local overrides is also handy, particularly for testing marketing and user-acquisition features. I don’t recommend using pre-production environments for this sort of testing unless there’s no alternative.

Other aspects of testing and feature flags

There’s a lot more to talk about when it comes to testing and feature flags. Some topics which we didn’t look at in this post (but could in future posts) include:

Automated Testing: What capabilities does your flagging system need in order to best support automated testing? What additional automated testing should you do for a flagged feature?
Feature flag testing strategy: How does a tester decide which combination of flags to test when?
Managing feature flag configuration across environments: Should we have distinct flag configuration for each environment, or should it be shared? How do we keep flag configuration in sync across environments, and how do we promote or migrate a flag configuration change?

If you’re interested in posts on these additional topics, or others, don’t be shy! Let me know by sending me a toot or otherwise getting in touch.

A hidden “god mode” screen is a really nice thing to have in any sort of rich client app, with an area for feature flag management if you’re using them heavily. It will typically display the current set of flags along with a description of what each flag does and its current state, as well as the ability to manage flag state overrides. ↩︎

Pete Hodgson

Outside Expertise For Your Engineering Teams

Testing a feature-flagged change

A worked example - free shipping

Pre-production environments

Testing in production

Targetted release

Flag overrides

So, what should we use?

Other aspects of testing and feature flags

A worked example - free shipping

Pre-production environments

Testing in production

Targetted release

Flag overrides

So, what should we use?

Other aspects of testing and feature flags

share this post