Sentry is more than a product; it’s a service. In the first article of the Building a Fort: The OutSystems Sentry series, José Casinha shared the journey of how Sentry met rigorous compliance requirements, thereby ensuring we deliver a secure foundation and service as part of the fort. In this post, I’m going to dig deeper into how we addressed the three main technical challenges of building such a fort.
1. Deriving Technical Requirements From SOC 2 Criteria
Securing the fort was a challenge. To meet our security needs, we first had to define what the requirements were. It’s vital that the technical requirements be very specific. Ambiguous non-technical requirements leave a lot of room for interpretation and can lead to incomplete implementations. Or you could end up with an overdone implementation with completely inadequate features.
On the other hand, deciding on a one-size-fits-all requirement that's defined by a single standard of compliance brings its own set of problems. Applying a common requirement to different products or services that differ in implementation or usages can lead to different outcomes. Kind of like having just one size of T-shirt—it’s never a good fit for everyone, right? Our cloud platform-as-a-service (PaaS) offering is tailored, so a standard solution might not be appropriate.
The industry standard, the SOC 2 reporting standard, is an auditing procedure that defines and ensures criteria for managing customer data based on the Trust Services Criteria (TSC), and it’s used to evaluate an organization’s information systems. The criteria are classified into the following five categories: privacy, security, availability, processing integrity, and confidentiality. It also defines the requirements for evaluating the operational control of information systems, across the system components, classified in five categories: infrastructure, software, people, procedures, and data. On top this, it defines a set of criteria controls to be assessed and reported in the attestation of systems or entities. And so the first big challenge was finding answers to the following questions.
How do we derive the technical non-functional requirements for our cloud PaaS from the SOC 2 Standard criteria when it is a unique, custom-built product that is designed to address the particular needs of our customers?
Could we still rely on the industry standards to address all the technical requirements for complying with the SOC 2 standard?
To find out, we started aligning the SOC 2 standard criteria with tangible technical requirements, using the following structured approach. We identified how the trust services criteria would map to the technical requirements and other aspects of our product. It was meticulous work on a large scale, so to make sure we checked off all the boxes, we defined a matrix to analyze the correlation between the five categories of trust services criteria and the five key components. Next, we evaluated how each criterion would impact our product.
For example, for the common criteria (criteria common to all five TSC criteria categories) Threat Identification requires that “The entity (1) identifies potential threats that would impair system (...) commitments and requirements, (...) analyzes the significance of risks associated with the identified threats, and (...) determines mitigation strategies for those risks (including controls and other mitigation strategies).” So, we assessed each component against each principle, and we identified the system gaps accordingly as shown in this table.
After instantiating all criteria, we ended up with about 40 tables that identified all the technical gaps. We aggregated these technical requirement gaps into 10 areas of concern: architecture, authentication and authorization, auditing, business continuity plan, change management, disaster recovery, hardening, human resources, monitoring and operation, and user privacy.
Now that we had a clear picture, we could identify the user stories that would meet each area of concern. From that point on, it was a matter of designing, estimating, and prioritizing each of the 200 user stories to be implemented by R&D and the OutSystems Security Office.
2. Ensuring the Operational Requirements
A critical operational aspect in systems security is how operational teams access the systems for troubleshooting and recovery. To be sure we chose the best approach, we identified the system requirements that would meet our operational needs, which were:
- Allow access to thousands of servers and databases scattered across different AWS regions.
- Make sure it was scalable for over 100 operations engineers to work concurrently while ensuring proper tracing and sandboxing for each engineer.
- Provide support for all diagnostic and troubleshooting tools (debuggers, dump analyzers, etc.) needed by our operations engineers.
- Ensure secure, traceable access from anywhere in the world.
- Provide 24/7 availability.
Would Serverless Jumpboxes Be a Fit?
The industry standard for granting operational teams access to a system is jumpbox-based. Jumpboxes are secure bastion servers or machines used as stepping systems between two security zones. They allow access to the target system and are protected by strict network restrictions. When looking into jumpbox solutions, we found an infrastructure management overhead that included significant increases in the cost of development and maintenance without enough added value.
We needed a unified solution; it had to be something that wouldn't need an IT department to manage it. A single bastion host for multiple operators or a server per operator didn't fit with our scalability requirement. On the other hand, installing troubleshooting and analysis tools inside the systems would adversely affect the development, maintenance, and operations of these systems.
AWS WorkSpaces as Serverless Jumpboxes?
We needed an alternative with lower development and maintenance costs. We were intrigued by the idea of using AWS WorkSpaces as serverless jumpboxes. The WorkSpaces (a managed, secure cloud desktop service) met our needs, required less effort, and was almost turnkey. With WorkSpaces we’d get:
- Secure access from anywhere in the world (multi-factor based authentication over encrypted channels)
- Scalability with cost control: provision on-demand, pay-as-you-go
- Complete freedom to install software using WorkSpaces Image Bundles or WorkSpaces Application Manager
- A Cloud Formation template to instantiate the WorkSpaces network and authentication (Microsoft AD with Radius servers), and AWS Console for user provisioning in bulk.
This was pretty close to what we needed, so we decided to go with the AWS WorkSpaces. The CloudFormation template that instantiates the WorkSpaces architecture based on AWS best practices includes:
- An Amazon Virtual Private Cloud (VPC)
- A Microsoft Active Directory (AD) deployed into 2 private subnets
- Two Radius servers for multi-factor authentication
- Two public subnets with NAT Gateways (one per subnet, allowing for redundancy) and elastic IPs
- WorkSpaces internet traffic routed through the NAT gateways
- A software bundle that included all the software required for access, data collection, and troubleshooting.
3. Ensuring Secure On-Demand Access
Another technical challenge we had to overcome was the management of operational user pools across thousands of systems scattered over several regions and types of technology. We had to ensure the security requirements were met for user authentication, authorization, and accounting but with little maintenance effort. Integrating the authentication across different technology (Windows, Linux, SQL Server, Oracle, OutSystems and Splunk) can be achieved using a single sign-on (SSO) platform or software, but we wanted more proper authentication.
On-Demand Access, But Only When Necessary
We have high expectations, so we decided we’d raise the bar on the security requirements and ensure the operation engineers only have access to the PaaS system when strictly necessary for a short amount of time. We also decided to provide extended auditing information associated with operational tickets. Our requirements were:
- The system should be easily accessed and used by the operations engineers.
- The operation engineer doesn’t have direct asset access (from a jumpbox) by default.
- The operation engineer only has access to the asset when it’s required under the scope of an operational task or ticket.
- The operation engineers must be nominal, and their credentials must rotate periodically.
Relying on an SSO solution to manage authentication and permissions for more than a hundred engineers (and growing) across thousands of different assets would most likely lead to an always-on and all-access policy as a means to simplify the management tasks. And we just couldn't accept that, nor the extra costs in infrastructure and possibly in software.
After looking at several alternatives, we started leaning toward the notion of creating access credentials on-demand. And, while we continued to iterate on that idea, we discovered that we could meet all our security requirements with a low-cost and scalable solution.
On-Demand Credentials…With an Expiration Date!
To address our system access requirements, we extended the automated generation of temporary credentials on-demand. We could put in place extended controls, like approval workflow or linking the reason for access to the open tickets. We could define an expiration date, select the permission level, activate file transfer capability, all per credential request. And all with full auditing capabilities only accessible from the operational serverless jumpboxes.
This was quite the breakthrough. The users are created by automation tasks, either event-driven or via APIs, depending on the type of asset. The benefits just piled up:
- It’s not technology-dependent, so we can easily evolve to accommodate new kinds of technology.
- There are no user-dependent costs that increase with the number of users.
- When the credential expires, the user is deleted, so there’s no need for password rotation.
- There are no complex network architectures to ensure SSO or federated authentication schemas across the regions.
- There’s no need to store the temporary credentials in a password manager (we’ve improved the UI so the access with a temporary user is at the distance of a copy + paste).
- We don’t need to manage user permissions at the asset level.
- It solves a lot of system access level security compliance requirements.
We reduced the attack surface of the cloud and machines to a single short-lived user per machine with a managed computer-generated password instead of hundreds of users with human-chosen passwords that are known to be flawed.
A Custom Solution for our PaaS Offering
This has been years in the making. We've been thinking about systems security and having discussions about it because it’s vital that we find the best solution that guarantees the safety of our Paas offer. Centralized authentication systems have been a high-value target for penetration because when compromised, they allow access to a large number of systems. Compromise even one user, and everything that user can access is vulnerable. Because of this, we decided to build a tailored solution for our cloud PaaS offering. The tailored solution was highly praised by our SOC 2 auditors, and our cloud operations team uses it daily, with a high level of satisfaction.
Challenging the Status Quo
This was a massive undertaking, and I’ve only shared the tip of the iceberg of the innovations we achieved during the development of Sentry. I chose to highlight the technical challenges that I found the most difficult to solve; they were also the ones that I enjoyed the most. This was the most demanding project I’ve had the chance to be a part of. It allowed us to innovate in areas where standards usually mandate complex and static solutions. And this is what is at the heart of what Sentry is all about—challenging the status quo when the common solutions don’t fit our custom-built PaaS offerings—while at the same time bringing in the industry standards when it makes strategic sense.
In the end, we provide our customers with a more flexible operation on top of a highly complex and compliant system. OutSystems Sentry brings an immense sense of satisfaction to everyone involved because we know that we delivered a solution that complies to the highest security standards.