Starling Technology

How we made everyone a Release Engineer

All software engineers are faced with this challenge - how to make the code they’ve created available for use. Releasing software means different things to different people. It can be as ordinary as making code available to others, commonly on a platform like GitHub, or, in the case of companies providing a product, by releasing their packaged code or executable into what most people refer to as production.

How does Starling structure releases?

Here at Starling, engineers are aligned to either an Engineering Group - a team of teams who work on a correlated set of services (think Account Management or Customer Service), or an Engineering Practice - a team of teams responsible for making every engineer’s task at Starling easier or more straightforward (examples may include the Infrastructure team, or Developer Experience, the team we belong to).

No matter if you are group or practice aligned at Starling, we all follow the same release process to production.

That release process is heavily informed by our Engineering Principles - and we would like to call out two of them which specifically guide our Release Culture:

  • We prefer small incremental changes delivered to production frequently.

  • Every part of our system and process should be automated.

Whilst the only indication most of our customers should have of something changing is when they have to update their app from the App Store or Google Play Store - Starling actually release to production multiple times per day. Our releases fall into a few different categories - typically ‘Infrastructure’, ‘Tooling’, ‘Mobile’ and ‘Platform’ - which this post focuses on.

What is the Starling release theory?

At Starling we consider there to be two primary environments, DEMO and PROD.

  • DEMO is our prod-like environment which is treated as a simulated production environment.

  • PROD is our real production environment that our customers and staff use.

Starling Release Flow

What makes up a Starling release?

As you will see from the above diagram, we deploy to our Demo environment as pull requests are merged to main - our implementation of Continuous Integration. After testing in demo environment, the changes are ready for production release!

A developer at Starling is free and encouraged to release not only their own work, but that of others too. That is you don’t have to be in the Account Management group to release the latest changes to one of their services. We continuously collect data throughout the release process and try to address pain points discovered, making releases so effortless that there is increasingly less overhead to completing one.

Starling has adopted containerisation for our software architecture. Each one of our services is packaged and containerised independently - and they can be released independently if need be. We typically release a whole engineering group (remember Account Management from earlier) at a time though. This helps to ensure we have consistency across all services, and provides a guarantee of them working together.

As an aside, we have several tools that check for stability and errors in all environments, as well as simulators to continually drive production-like traffic into our demo environment - we want to avoid failure, but we aren’t afraid to try things - having the demo environment configured like this allows us to experiment and fail quickly whilst developing, without impact to our customers.

To move beyond Demo we leverage our in-house tool ‘Engineering Portal’ (EP for short) to manage releases. EP is the central hub for all kinds of engineering operations which safeguards the quality and sanity of our releases. By connecting to several third party SaaS solutions and internal release-related utility services, it can be thought of as the funnel through which all code is pushed to production, or filtered out for another time. It is the tool that captures the technical detail of our changes, the approvals from the correct people, and the metadata related to a release that regulators require banks have - it is our immutable audit of what software we release to production.

In product terms Engineering Portal is connected to:

  • our Git repositories so as to retrieve code and metadata

  • our Continuous Integration software to validate build statuses

  • our Orchestration tooling to notify approvers that they have a task to complete.

How Starling makes a safe release?

Picking up our earlier example whereby anyone can release anything to production - if we want to conduct a release, we have to go through three layers of safeguards: Commit Approval, Technical Approval and Business Approval, the scope of which becomes wider at each layer. (Note: these are different from code review which happens before the entire release flow.)

Commit Approval usually comes from the commit author focusing only on a particular commit. We need to verify mainly three things in demo environment that the change made (1) does not impair existing functionalities on impacted services, (2) functions as expected, (3) does not have dependency on future commits.

Technical Approval is granted from a set of nominated individuals (who are experts in the set of services being released) that review a batch of changes since the last production release. Implicit conflict between commits in the same release, potential breakage of backward compatibility are some of those key focuses in this layer. Potential risks and their impact radius are also highlighted for visibility. Once granted, it signifies that the release in review is technically sound!

So, why do we need an extra approval? Business Approval is here to manage our overall risk as a regulated bank. It is granted from one of the Engineering Leads, Engineering Directors and/or C-Suite level executives. With notes provided by technical approver, they are to say that it is a ‘safe’ time to make a production release in terms of market conditions and other business factors. Imagine interrupting anyone’s Prime Day purchases with a poorly timed update, what a disaster it could be!

Finally, with all the approvals granted, we can commence the release from our EP tool, and ‘roll’ the production release. At this point it will also highlight to my colleagues that I’m actively conducting a release by posting a humorous GIF into a Slack channel to let them know what’s going on - more on that from our Group CTO, Steve here if you’re interested! We will continue to monitor the release with a variety of observability tools we integrated, and, in case anything does show up unexpectedly, we can always selectively rollback service(s) in concern from EP without any hassle.

Why Starling performs release in this way?

A study from the FCA identified some key practices of implementing technology change: (1) high levels of automation instead of manual practice, (2) well-established governance arrangements, (3) frequent releases to avoid big-bang scenarios. Engineering Portal helps us with all three of these! The three-layer approval works really well when it comes to traceability and accountability, which is a great positive confidence boosters in conducting successful changes. It helps us integrate the risk management into our day-to-day release process with high levels of automation and visibility, so that we can focus on releasing product to our customer in a frequent manner.

What do you think - can you see a way to further improve how we release? Is there something you’d like to know in more detail about releases at Starling? Let us know and we’ll be sure to consider it for future revisions to this article, or even a dedicated technical musing in a future post!