The Road to Continuous Deployment (part 4)

The story below details a very interesting & transformational project that I was a part of in 2014 and 2015, at a Dutch company. I’ve told this story before during a number of conference talks (slides and videos are available, if you’re interested), and I’ve now finally come around to writing it up as a series of blog posts!

This is part 4 and the final installment of a multipart series on how CD, DevOps, and other principles were used to overhaul an existing software application powering multiple online job boards, during a project at a large company in The Netherlands.

In the previous installment, I discussed code coverage and other metrics, testing, pipelines and deployments. Now, it’s time to wrap this series up by sharing and discussing a few results!

Results and statistics

Per service, we ended up with a total build+deploy time of slightly under 10 minutes. This is measured from the start of the pipeline (trigged by pushing a commit) to all production instances replaced with new ones. Over all the services, we averaged about 50 successful pipeline runs per day.

With our fast pipelines, testing strategy and quality improvements we were able to significantly reduce the number of issues and outages. Gone are the days of releasing on the weekend or with downtime! This directly contributed to one the biggest changes for the team (even though it’s hard to quantify): their confidence in changing the (new) system.

Additionally, page load times were improved by an order of magnitude. From 5+ seconds down to under a second. Audience statistics & metrics were improved all across the board: click-through rates, number of pages visited, session duration, etc.

During this project, the team was exposed to and able to learn about more and more (relatively) new and exciting technologies, such as Docker, Elasticsearch and others.

Did everything go well? Of course not.

Initially, some team members were hesitant in accepting the new way of working. Not that surprising when you start talking to a team that has been working on a legacy system for years and became a little numb in the process. By persevering, showing the team the way and a lot of pair programming we were able to show that the new way of working was significantly better.

Another thing was pipeline stability. Because a pipeline is the only path to production for changes, dealing with a failing pipeline has a high priority. If the pipeline failure is caused by external factors, such as network connections failing, upstream package repositories becoming unavailable, etc., fixing those issues may be non-trivial. However, an unstable pipeline is a potential threat to the credibility and hence successful adoption of CD.

Last but not least: keep an eye on the number of feature toggles you have in your system. A hard cap on that number may or may not be feasible, but a large number of feature toggles can lead to an anti-pattern – especially if most of those feature toggles are off (hiding a feature).

For this project, we endeavored to remove feature toggles as soon as possible. In short, that meant that as soon as a feature toggle switches to “on”, the toggle is removed directly. If it turns out the feature isn’t interesting enough to go live, the toggle and all the code for the new feature is removed. Indeed, Martin Fowler suggested treating feature toggles as inventory and striving to keep that inventory as low as possible.

Conclusion

This marks the end of the “Road to Continuous Deployment” article series. I hope you liked it! Let me know what you think in the comments.

Michiel Rook's blog

The Road to Continuous Deployment (part 4)

Results and statistics

Did everything go well? Of course not.

Conclusion

Michiel Rook

Leave a Reply