How we manage code deploys at OpenSky
At OpenSky we've gone through a few different iterations of how we manage changes to our software. This post aims to describe some of the changes we went through and why.
How did we historically operate?
Historically we had 3 branches named develop, release and production. Developers would work on changes in a branch created from develop. Once the work was done, the branch would be merged back in to develop. Then every 2 weeks, we'd move everything from develop to release and everything would be tested together before being merged to production. Then a tag would be created from the production branch and the tag would get deployed. Hotfixes and other small changes would sometimes be branched and merged directly to the production branch.
This approach works well but it had a few problems for us:
If we merged 20 different changes in to the develop branch and 1 of the changes has an issue, it would block the other 19 changes from going to production until the 1 issue was resolved. In some cases if the bug wasn't fixed quickly, we'd have a process to revert the change out of the release branch so that everything else can move forward.
If 1 of the 20 changes has an issue once it is deployed to production, it is a little more difficult and time consuming to quickly flip it back. It requires a git revert and a new tag to be created. This has to be done either way, but you can't just flip the one change back instantly without a full deploy process.
If you have a problem during testing or in production, it is hard to know which of the 20 changes caused the problem so everyone has to jump in to figure out if it was their change or someone elses.
We have two different paths & processes for how something can get to production. The question was always asked if a change should be branched from develop or production.
Other things were more complicated and messy as a result of having multiple branches that I can't remember :)
How did we improve this?
Over the years we invested heavily in our automated test and deploy processes. The goal was to make the processes very fast, stable and easy to rollback if something goes sideways. A developer should be able to push up a change and get feedback from the automated test suite in 2-3 minutes or less. It used to take up to an hour to get feedback from the tests because the tests all ran serially. We built a tool internally called PHPChunkit that allows you to run your tests in parallel chunks across multiple servers. On top of the speed issues, the coverage was not that great, so every change required lots of manual QA before it could go to production. The need to manually QA everything meant it was very costly to test and deploy individual changes in isolation. This is one of the reasons the above process to batch things up and test and deploy them together every 2 weeks was used.
We also improved the speed to deploy to production. The process used to be a series of automated steps that were executed by a human and would take up to an hour sometimes. Once this process was fully streamlined, optimized and automated we were able take a change to production in less than 10 minutes. This is including running the full test suite before going to production.
There was a bit of a cultural change that happened through all of this too. Due to limited manual testing resources, we mandated that developers must include unit and functional tests with their changes. If they fix a bug, the first commit should be a test demonstrating the failure and the 2nd commit should fix it.
Some of the tools we use are:
Jenkins - Job management.
PHPChunkit - Used to chunk up and parallelize the execution of our PHPUnit tests across multiple servers.
pr-nightmare - an internal tool that coordinates the work between JIRA, GitHub Enterprise and Jenkins. I hope to make this generic enough one day and open source it, but unfortunately it is not public yet.
Fabric - Fabric is a Python library and command-line tool for streamlining the use of SSH for application deployment or systems administration tasks.
How do we use branches today?
Once we had solid CI & CD processes, it was time to change how we worked with branches. We got rid of the develop and release branches and all changes get branched off of production. All changes are tested and deployed in isolation. Branches are usually short lived and are tied to a single developers work. We now deploy an average of 5 times per day and have deployed as many as 20 times in a single day. Our overall throughput went up, we have less issues in production and our fear of change decreased.
Now new developers deploy to production on their first day on the job!
Screenshots
Screenshot of GitHub Enterprise:
Screenshot of pr-nightmare:
We are hiring!
If you are interested in working at OpenSky, contact me at [email protected]!












