The Good, The Bad, and The Quickly: The Journey to Progressive Delivery
Progressive delivery is the process of releasing updates to a product gradually. You start by releasing the changes to a small audience and then gradually to larger audiences. This process helps to maintain quality control, move faster with greater safety and give the ability to learn from the process. All of these benefits make it easier to deliver features that positively impact your business.
Progressive delivery has allowed companies to kill the weekly or monthly “big bang” release night and replace it with smaller and more frequent releases. While the idea of dozens or hundreds of releases per week or even per day may sound scary, it turns out that teams that work this way have both higher throughput and greater stability when compared to their slower moving peers (see Accelerate: State of DevOps 2019).
The path to progressive delivery is a story of once “radical” ideas becoming mainstream as their benefits proved out. It starts with the move to continuous integration.
With continuous integration, the goal was to always have a working build and for each developer to merge often. If a merge broke the build, then all further commits would be halted until the build was working again. When this process came to the industry, it was considered a “radical” idea.
After continuous integration came the next ‘radical’ idea, continuous delivery. This is the ability to get updates such as new features, configuration and bug fixes to users in a non-disruptive way by automating the release process, so it would not be a manual fire-drill. It stopped deployments from being a labor-intensive, multi-hour process filled with opportunities for human error.
Then came perhaps the most “radical” idea yet: continuous deployment. What if a developer could commit code and it would flow all the way through to production without any manual process whatsoever? If all tests pass after the commit, the new code goes live.
So, at this point, DevOps teams were doing continuous integration and either continuous delivery or continuous deployment. Moving faster and automating many (or all) of the steps raised concerns that problems might still be missed, impacting users, possibly in catastrophic ways. This led to the question, “How do we limit the blast radius if problems do arise?” The answer: progressive delivery!
While progressive delivery seems shiny and new, it is actually just a new term for something that has been emerging for some time. The term was coined during a conversation between Sam Guckenheimer of Azure DevOps and James Governor of RedMonk. Importantly, even at its very genesis the term stood for not only wanting to contain the blast radius of changes but also learn during the process.
Progressive delivery is so powerful because it allows us to avoid downtime, limit blast radius, and learn during the process.
- Avoiding downtime is critical, especially in SaaS. Users aren’t big fans of “maintenance windows” and nobody likes unplanned downtime.
- Limiting the blast radius is one of those practices that’s just intuitively obvious once we know that it’s possible. Progressive delivery gives us early visibility of the health of a feature rollout so it can be halted before widespread impact.
- Learning from everything we do gets right back to the core principles of continuous delivery. We don’t just want to identify and halt problems. We want to learn from every iteration.
There are multiple ways to accomplish progressive delivery. While each approach meets one or more of the above three goals, there are very real differences in the benefits they deliver to teams that are seeking greater release velocity by allowing multiple teams to work in parallel, each moving at their own speed. Let’s explore “four shades” of progressive delivery: blue/green deployment, canary releases, feature flag rollouts, and feature delivery platforms.
Blue/green deployment requires that we have two copies of our production infrastructure. The “blue” copy is the one handling production traffic before a release. The “green” copy is used to gradually stage the release and conduct readiness checks. Once green is ready, new traffic is routed to it and existing traffic is drained off of blue.
A canary release is a more subtle variation on blue/green. Here the new code is deployed to a subset of your infrastructure and a small number of users (often as little as 2%) are routed through it. The canary server traffic can be monitored closely to identify unforeseen issues. If it all goes well, you increase the size of the canary and repeat the process until all servers are on the new release. If issues are found, you back off by draining traffic from the canaries, allowing the rest of the infrastructure (still running the prior release) to carry on.
Feature flag rollouts are a big leap in the evolution of progressive delivery because they allow DevOps teams to do a canary release at the feature level and control exposure of the new release per user (not per system instance). New code is deployed to the entire infrastructure, and turned off by default. Then, the code is selectively exposed to users, starting with the development team for testing in production. If that goes well, you can expand the exposure to include employees and look for issues there (this is known as dogfooding). Finally, the new code is gradually released to the general user population in increments – first to 5% then 10% then 20%, and so on. Feature flags do a great job with avoiding downtime because turning them on and off doesn’t require a deployment or rollback. Limiting the blast radius is inherent in the user-level control of exposure and ability to ramp up or down without deployments, infrastructure changes or network engineering. The feature level control means you’ll never have to roll back the “good” features in a release just because one feature has issues.
Finally, feature delivery platforms build on top of feature flag rollout capabilities by automatically monitoring how a release is doing while the rollout is taking place and determining the technical and business impacts of a code change by running statistically valid experiments. In simplified terms, a feature delivery platform goes beyond controlling who should be exposed to the code by also tracking who actually got it and determining how well it’s going for those users and your systems.
Experimentation (often labeled “A/B Testing” even though there can be more than two experiences being tested) is really just the same process of aligning observations with the actual user list exposed and not exposed to a feature, but with greater rigor in the statistics.
In summary, feature delivery platforms deliver on limiting downtime, controlling the blast radius as well or better than the other approaches to progressive delivery, while being in a class of their own when it comes to learning during the process. Whether a rollout is halted because of technical issues or because the desired user impact has not been met, having hard data about what works and what doesn’t brings the promise of a feedback-driven continuous delivery/deployment loop to life.
-Dave Karow, Split