At OutSystems R&D, we use a trunk-based development pattern on Service Studio. This means that each developer divides their work into smaller pieces merged into the mainline or trunk, often several times a day. Like almost everything in software engineering, the correct answer to the question “Is this good or bad?” is “It depends.” Trunk-based development has its pros and cons, which we will address in another post (in the meantime, here’s a good discussion), but my main point is:
“Trunk-Based Development is a key enabler of Continuous Integration and by extension Continuous Delivery. When individuals on a team are committing their changes to the trunk multiple times a day, it ensures the codebase is always releasable on demand and helps to make Continuous Delivery a reality.”
Before We Start
This is vital to automated releases: you must have a releasable product. All. The. Time. Otherwise, what are you automatically releasing? There’s no release-ready product.
So, before we start, you have to ensure that your teams are disciplined and focused on having green pipelines as much as possible. Perhaps the best metric is “Given a number of commits, how many resulted in a green?” green being:
- It builds successfully.
- It passed all tests.
- Binaries are ready to be released.
The higher the number, the better. However, we had to take a minor step: we needed at least one green per day (We even have a saying at OutSystems about this: “One green a day keeps the manager away!”). Why? Because it’s much less aggressive than looking at every single commit. Teams just have to ensure that the pipeline is green at least once a day.
The above image illustrates the percentage of working days with at least one green in a given month. We call it the Green Minesweeper. For example, in May 2019, there were 22 working days, and we only failed two days, which means 20 out of 22 were green (91%). It took us some time, however, to achieve this.
Go Green or (Don’t) Go Home
We introduced rules like “You can’t go home if your commit is impacting a green,” or “You can’t commit at 5:00 p.m. and leave at 5:01 p.m. without commit feedback,” and even “No failing tests for more than X hours.” Nowadays, it’s second nature, it’s part of everyone’s life: solving failing tests is the top priority.
Note: You can obviously have a product ready to be released all the time without trunk-based development; I was just stating our reality.
A Manual Mess
At OutSystems R&D, we have two release stages: Dogfooding (DF) and General Availability (GA). At the DF stage, we are responsible for internally launching a new version of Service Studio and making it internally and broadly available to be used and tested all over the company. During the GA stage, we make sure that the DF version reaches every customer.
Every Friday was “code freeze day,” which meant that all teams had until Friday to commit the necessary improvements or bug fixes on the next release.
On the following Monday, a Service Studio Team member and a Release Team member would check for overall alignment (release notes, Jira issues, binaries). They would also make the release available to DF users (where it stayed for a week before reaching GA) and manually update the auto-update endpoint.
Finally, they would send the release notes to be curated by the Product Content Team. This manual process took us around four hours every week.
As with every manual task, it was also prone to human error. From time to time, there were release problems — incorrect release notes, some missing issues, wrong binaries, etc. — which forced us to check for inconsistencies manually. How? Either by looking at the very few logs we had or asking teammates involved in all the tasks. You can see how the process of fixing a mistake along the road was complex and time-consuming.
At OutSystems, we use our own product to create our in-house apps, and the web app we use to control the release process is proof of that. The Release Center helps us automate several actions, thus simplifying the release process. However, not all aspects of the process were entirely automated, and some steps still required manual confirmation — along with pressing several buttons — for each release.
Automatic for the People
Our first and primary objective was to automate the DF release to reduce the weekly four-hour workload. That would allow our teams more time to focus on what we call the important, fun stuff — coding and creating new features.
However, we also realized that there were some slight improvements we could do along the way, like having better logs, providing more feedback each time there was a change in an upcoming release, and automating the creation of release notes.
We started with the latter. Every time developers commit a change to the trunk, they do it under a release that has been already created and is at the development stage — knowing that, the developer also publishes a commit message stating the change and the Jira issue ID.
By doing this, we can create an automation that picks up on the referred issue and adds a release label with the date on which that issue will be available. We’re capable of querying all Jira issues created each time a new release is sent to DF and, therefore, fetch and compile all the release notes.
Now that we had found a way to get all the issues and changes within a release, we could automate the release process to the DF stage. We did so by getting the initial and the final revisions of a release and then adding that information to the release itself. Using that range, we took a similar approach to create release notes and began querying SVN to get all revisions per branch encompassed in the release.
With all the information in one place, we only needed to obtain the green build, which contained the required range, and publish the binaries into the cloud.
Automating something can be highly advantageous. On the other hand, it can also set us back and become somewhat dangerous for a business, especially when done carelessly. Errors created during automation would propagate faster and silently, which, in this case, means having an unwanted release being delivered into the DF stage, potentially hurting customers if it reached GA.
To reduce any risks brought by full release automation, it was also imperative that we had a logging and feedback mechanism that would alert us if something went wrong.
For the Logging mechanism, we just had to take advantage of existing logging actions and create a webpage to make those logs available to everyone interested.
Our Slack Sentinel
We use Slack at OutSystems as our main internal messaging program, making it the perfect platform to send feedback messages. This is why we implemented a Slack bot as a feedback mechanism to communicate any changes in a release. In addition, it serves as an alert each time automation fails. These two mechanisms allowed us to troubleshoot release problems and act before they reached GA on several occasions.
Now that we had the DF release entirely automated, the last step was applying it to GA releases for a fully automated pipeline. This final step was much easier to accomplish because we already had the knowledge required, and the workload when launching a release from DF into GA is very low compared to that needed to make it available for DF. Simply, we had to copy-paste and adapt some previously created actions.
However, during this last phase, we realized that all of that automation also required an easy stop/start mechanism to be used by the R&D teams on automated release processes or change release dates.
We achieved this using Business Process Technology (BPT) and Service Center timers. We set up a daily timer that would automatically run a BPT and check the planned release date in the configuration details. The BPT would then compare that date with the current one and trigger the release process if they matched. To make it entirely customizable, we added the possibility of changing the release date of each launch type on each configuration page.
What Have We Achieved?
In the end, we got a fully automated release pipeline that would not only automate basic tasks but also bring some features like more logging and feedback.
Before this implementation, making a Service Studio release was a time-consuming task that only a handful of people could do. Right after implementing it, we started to feel a significant improvement in the delivery rate. With these improvements, the R&D engineers could efficiently perform something that was once very complex — hotfixes.
Now, every time we needed to fix something in the GA Service Studio version and team members didn’t want to wait two weeks for their commit, the option was to call a colleague from the Release Team and execute side by side the many steps that once took two engineers a whole workday.
After automating the creation of releases, we realized that we could tweak and simplify the hotfix process, reducing the time required to do it from one day to three hours.
Transforming the release process from something that only five people could do into something that everyone in R&D could do had two significant advantages:
- The handover of power and trust to every software engineer, which ultimately creates a feeling of belonging and motivates them to create a better product.
- The removal of bottlenecks.
Today, every team member is fully autonomous in the overall development of Service Studio, from the moment they write the first line of code to the moment they deliver it to the customer. But, more importantly, remember our Green Minesweeper math? We did that here too. We are now saving 26 days a year in an engineer’s life!