If you’re into electrical engineering, for sure you have heard of circuit breakers. They’re automatic electrical switches that interrupt the electric flow in case of current overload or short circuit to protect an electrical circuit from damage. In software development, we also have circuit breaker patterns that perform a very similar job, protecting your app.
The application’s ecosystems are increasingly more complex. On one side, companies are making an effort to split their portfolios into more manageable parts and deploy them as autonomous and independent microservices. On the other, the proliferation of highly specialized SaaS products made API integration essential in any application development initiative.
Although APIs are great for creating very complete applications, they also expose your app to the slowness, misbehaviors, and unavailability of the systems from which you are consuming the APIs. That's where the circuit breaker pattern comes in handy.
Context: Framing the Problem
Imagine the following scenario. You are developing a Customer 360 application to allow the users to check the data from your company’s customers. That precious information is stored in the company’s CRM and, therefore, the most used screen of your application is the one that lists customers and allows you to select the one you’d like to see details about.
So far, so good. You use your CRM’s API to get that data, and life is beautiful. Your users are happy, the sun shines in the sky, birds fly by, and you sleep like a baby.
But, one night the phone rings. There’s a problem with your app, and nobody can access it. You rush to your laptop, and you go around the application logs. The error is there in plain sight: the CRM is unresponsive, and all requests to their APIs are timing out after 30 seconds.
Your app is down, because, well, another app is down. And this is a core problem of any distributed system. Failure in one system quickly propagates into all dependent systems that are using their APIs.
When talking about APIs, timeouts are a sensitive topic. In a scenario like the one I described, you have potentially millions of customers making requests to your app, just to wait 30 long seconds to conclude that the service is down. Such behavior provides a terrible user experience, and exhausts your server resources with all the pending requests waiting for the timeout to happen.
Solution: Enter the Circuit Breaker Pattern
This is where the circuit breaker pattern can help. The pattern can be used in multiple situations, but in this scenario, I’ll focus on applying circuit breakers to REST APIs.
The concept of circuit breaker pattern, popularized by Michael Nygard in his book, Release It!, is pretty simple. You create a system through which all API calls go through that continually monitors for failures. In case of a timeout failure, the circuit breaker moves from closed to open state, and all further calls to the API don't reach the external system. This way, you save your infrastructure from being stalled waiting for a service that you already know is down, and your users from a terrible experience.
Let’s consider the most simpleOutSystems API I can think about. Considering the previous description of a circuit breaker mechanism, we can conclude that we need to execute operations prior and after an API invocation.
For that, we’ll encapsulate the API invocation in a dedicated action that will be responsible for all the validations and updates of the circuit breaker before and after the call.
I will now drill down on what actions we need to perform in each one of these steps.
Putting a Circuit Breaker Pattern Into Action
The algorithm below illustrates the implementation of the circuit breaker concept in OutSystems. Still, it will likely need to be tailored to your needs and address other challenges to make it production-ready. Take this as a guideline on how to implement this pattern, and as a foundation for you to build on top. Every logic that I’m showing from this point on is available for you to download on the OutSystems Forge.
Before we access the API, we need to check if the circuit breaker is in the closed state, which means that the API requests can go through.
Once the request is made, we need to update the circuit breaker status. If the API invocation is successful, we update the circuit breaker to ensure it continues closed. But, if the API request fails, we need to handle the API exception, identify the exception as a timeout, and update the circuit breaker status.
At this point, you may be thinking, “what about the CircuitBreaker_Setup action?”. The setup step is nothing more than an operational step. Its goal is to configure the circuit breaker and define the threshold associated with the number of consecutive timeout errors, beyond which we’ll consider the external system as unresponsive. This is a very fast idempotent action, so I prefer to put it in this main flow and execute it at each API request, instead of spreading the circuit breaker setup logic across the application.
So far this is pretty straight forward, and, despite the fact it doesn’t get that complex, the magic happens in the module that contains those three server actions:
These 3 actions are part of a module, called (surprisingly) CircuitBreaker.
This is what the CircuitBreaker_Setup logic looks like:
This action receives just two parameters: the first one is the EndpointKey, which serves as a circuit breaker key since any different API needs its own circuit breaker. The second one is the ErrorThreshold, which defines the number of consecutive timeout errors, after which the circuit breaker must be open, so no more calls are made to the external system.
The action logic is pretty straightforward and can be described as a simple “create if it doesn't exist.”
I’ll refrain from going into those CircuitBreaker_Exists and CircuitBreaker_Set actions. Suffice to say that they are the ones responsible for storing and querying the circuit breaker data. In this particular implementation, the circuit breaker persistence is being handled in the ASP.NET memory cache. Depending on your use cases’ complexity or performance, you may want to use other mechanisms such as a memory database like Redis.
Now, let’s move on to the CircuitBreaker_IsClosed action. It has a very simple logic, but we need to add an additional feature to handle the scenarios where we are checking the status of the circuit breaker, and it’s open.
If we did nothing, once open, the circuit breaker would become open forever. To avoid that, we need to add a mechanism that allows us to periodically test the status of the API, as shown in the image below.
In case of a successful invocation, the circuit breaker switches to the closed state. Otherwise, it remains open, and the circuit breaker will continue to periodically allow test invocations, waiting for the external system to become available again.
Lastly, we have the CircuitBreaker_Update action that is responsible for updating the circuit breaker status according to the result of the API's output.
This action has no magic or secret sauce. Its focus is to update the status of the circuit breaker based on the action input that indicates if a timeout occurred or not.
In case of a successful call, the circuit breaker remains closed, and the error threshold is reset. In the case of a timeout, the action validates if the number of timeout errors is already above the threshold. If so, the last error timestamp is stored, and the circuit breaker is opened, avoiding the cascading effect of an unresponsive system.
Delivering the Best Possible Experience for Your Users
Failure is an integral part of any distributed system, and we need to prepare our systems and applications for that. Remember that night when the phone rang because of the Customer 360 application being down? “It’s the CRM fault” is not an acceptable answer.
Any software engineer has the responsibility to ensure the applications we develop can handle failure, and the circuit breaker pattern is another tool to ensure that.
Your app may not be able to deliver what the user expects if an API is missing. And that’s another fundamental part of embracing failure: deliver an awesome experience even when you can’t give the user what they came for.
In the meantime, if you haven’t yet, try OutSystems for free.