Building a Fort tells the story of how OutSystems Sentry became a reality. If you want to join us at the beginning, I strongly encourage reading Building a Fort: The OutSystems Sentry Security Compliance Process and Building a Fort: The OutSystems Sentry Toughest Technical Challenges.
Monitoring OutSystems Sentry presents itself as a big challenge because not only is the cyber threat landscape continuously evolving, but we are also monitoring a platform where customers can build a diverse ecosystem of applications without our knowledge. Some of these applications can have hundreds of thousands of users all over the world, while others just serve a handful of users in a specific country or city, meaning that what one OutSystems customer may perceive as normal may represent a threat to others.
So, we needed to make sure that our monitoring system would be able to recognize and adapt to new patterns of utilization for each OutSystems Sentry Customer, while still performing traditional monitoring as normal.
For the sake of simplicity, I have divided this post into two sections. The first describes our approach to identifying, implementing, and operating the alerts. The second tells a real story of implementing, operating, and tuning an alert that can recognize and adapt itself to each OutSystems customer at a large scale.
Building, Testing, and Operating Alerts
In a previous post, José Casinha described a 24x7 Computer Security Incident Response Team (CSIRT) that is responsible for reacting to alerts and keeping OutSystems Sentry Customers safe from threats.
To accomplish this goal, we needed to make sure that the following properties applied:
- All alerts must have a priority.
- The false positive alerting rates are close to 0 percent.
- Each alert has proper documentation and clear procedures for reaction.
For these properties to be true, we decided to take an IT Infrastructure Library (ITIL) approach for designing, implementing, testing, operating and ensuring continuous improvement of alerts.
Each time we add a new alert to our monitoring system (we call it Alert Catalogue), the first step is to decide its priority (Urgent, High, Medium or Low).
So, we created an objective approach where we assigned weights to the key properties of security (confidentiality, integrity, availability, and authenticity). From there, we matched our internal taxonomy with these properties and determined the impact should a threat materialize. We were then able to create a scoring matrix that crosses the taxonomy with the properties and gives us the impact and an initial priority.
An alert is then assigned to one or more taxonomies to determine its final priority. For example, an alert that detects tampering of the core configurations of the OutSystems platform matches three taxonomies: “Misuse or unauthorized use of resources,” “Improper access to data or systems,” and “Unauthorized modification of information.” Having the taxonomies matched, we can then derive the final alert priority.
After defining the priority, it was time to design and document our response procedures so the CSIRT would know precisely what to do and how to react if the alert is triggered. Therefore, the next step was to identify which pieces of information we needed to implement the alert. Typically, each piece of information is converted into a single atomic finding called a building block, which is then correlated to build the alert.
Once we have the priority defined, the response procedure documented, and the required information identified, we enter the Alert Transition phase, where we implement, test, and tune the alert.
After successful testing and acceptance, we move it to Alert Operation, where our CSIRT receives the alert and takes action according to the defined procedure.
Every time this team finds a false positive or any other issue with the alert or response procedure, they document it and discuss it in a weekly meeting for tuning or correcting the alert or procedure in a continuous alert improvement cycle.
By adopting this approach, we can keep adding new alerts while not forgetting to improve the ones we already have.
Adaptable Predictive Alerts
The best example I can share about Adaptable Predictive Alerts that follows the methodology of the continuous alert improvement described previously is our distributed denial of service (DDoS) detection alert, which is currently on its third iteration.
At the present state, we have 237 sensors distributed all over the planet; the result is a lot of information to process. However, this was not always the case. When OutSystems Sentry started, we had just a few sensors in Europe, which provided us with information that we used to build our first DDoS alert.
In the first iteration of the alert we implemented, we started by identifying the pattern of the traffic we were receiving and created an alarm to trigger when the traffic shows a pattern that is going to divert from the one identified, over a defined period and with a difference of 50%. This alert worked fine for some time with almost 0% false positive error rate.
However, as time went by, we added more sensors and started to receive more information. We began to see an increase in false positive alert rates. At the time, we were still detecting threats but with a high cost of false positives. In a weekly meeting, one of our analysts shared the issue, and we went back to the drawing board.
In the second iteration, we decided not only to identify the current patterns but also to understand how users were employing the applications built by our OutSystems Sentry customers. From each of those, we created one building block for aggregating the abnormal traffic patterns and other for aggregating the abnormal user behaviors. This allowed us to correlate both within a period and to detect potential threats preemptively. After testing, we deployed the alert into operation and observed a decrease in the false positives to an almost 0% rate, which was pretty good.
Once again, as time went by, we added more sensors to the field and got more information showing an increase in the false positive rates. For the third time, we went back to the drawing board.
This time we decided we needed to make it so that the monitoring system adapts automatically to the evolving traffic patterns over time instead of mapping the current traffic patterns. By taking this approach, we aimed at creating a new set of building blocks that continuously adjust to new traffic patterns.
The picture above shows the traffic pattern in blue and our prediction with an upper bound of 50 percent in yellow. Every time the traffic surpasses the boundary, a building block is generated and correlated with the OutSystems Sentry customers’ user patterns. If an anomaly is detected, an alert triggers.
At the time I’m writing these lines, this third interaction is still in the implementation phase, but it is showing quite promising results already. We hope to have it thoroughly tested and promoted to operation before this post is published.
The Great Achievement
Monitoring an environment where the OutSystems customers can build diverse ecosystems of applications is a great challenge that technology alone would not be able to solve. By adopting a continuous alert improvement strategy, we guarantee that we have a permanently evolving monitoring system with no alert left behind.
Go build your apps securely and may the Fort be with you!