Over the past few months I built deployment pipelines that make use of the blue-green, or zero-downtime, deployment schema.
I’ve gained a lot of experience doing so and saw the limitations of this concept.
There is not a lot of content on the internet about this, which is why i want to give you my take on it. This blog post will thus cover the pros and cons to this approach.
What is Blue-Green and why would we want to use it?
The idea behind the blue-green deployment schema is that we have two versions of one web application running at all times but only one of them is active at a time while the other one is passive.
A load balancer is set in front of those web applications and will balance the load only to the active node.
When we want to deploy a new version, we deploy it to the passive web first and execute our tests against that instance. We now have time to make sure that everything works as we expect it to work.
After all our tests are successsful and we are sure we want to take it live, we switch the passive web to active and the active web to passive.
This way there is no downtime and the tested application is live in an instant.
If there is an unexpected error, we can always switch back to the previous version by the click of a button.
It is called blue-green because that is the way we differentiate both webs. One is called blue and the other green. It is not important which one we deploy to as long as we deploy to the currently passive node. (We could give them other names, too, of course.)
This all sounds nice and it is, really. However, there are a few drawbacks and things you have to consider before switching over from the old and tested way.
What are the limitations and how do we work around those?
One major problem with this approach is that both web applications have to work with one and the same database because data would otherwise be lost when switching between the applications (and their possibly different database versions).
A fact that poses limitations to the zero downtime claim which could be only be managed realistically if code-based database migrations are part of the deployed applications. If you do not have such migrations there is a timespan in which your live application will need to work with an updated or outdated schema. This will not be reflected in the code and will lead to errors if someone accesses the page.
In nearly the same way, there is also an issue with all other systems that are not the deployed application. If you have micro-services, for example, you have to deploy those in the blue-green manner as well.
This, however, would necessitate that one passive system is able to communicate with another passive system.
With URLs pre-configured for certain environments, you need some way to determine in runtime if the current application is passive or active and correspondingly communicate with other passive or active applications. Thus there is a massive overhead on switching from traditional deployment processes to this schema.
If that is not possible, it is as it always has been: Changes that span multiple systems will cause problems while at least one of the systems is outdated.
In the following posts, I will give you examples on how to implement this with real life examples using AzureDevOps YAML Pipelines and a Windows Server environment.