Challenges of Moving From Relational to Non-Relational Databases
This is the story of how we moved from relational to non-relational databases. As with every story, we should start at the beginning.
OutSystems supports the development of mobile applications since 2016. Back then, we introduced the ability to store data on mobile devices — and not only on servers. This gave OutSystems developers a choice between keeping data on servers or mobile devices.
The answer to addressing this data move was to rely on local storage entities. These are entities that behave just like database entities yet on the client-side. And if you need to query them, all you have to do is use OutSystems aggregates. Then, the platform does the hard work for you with a relational database engine.
So far, so good. However, a bigger problem would arrive in 2020, when we needed to start using a database supported by all major browsers and no relational option was available.
Now that we’ve set the stage, let’s go into the ins and outs of how we got this issue solved. But first, a quick definition of aggregates. The year is still 2016.
What Aggregates Are
They're a visual, easy-to-use approach to represent, combine, and filter data from multiple entities in OutSystems. They also abstract the type of underlying database.
With aggregates, OutSystems creates the corresponding optimized relational database queries. That means that developers don’t need to write any SQL code. Isn’t that awesome?
But how did we pull it off? How does the platform create optimized relational database queries for local storage entities? Well, by planting a relational database engine on the user's mobile device.
How to Query Local Storage Entities with OutSystems
First, each OutSystems mobile app runs in a native shell. Therefore, to implement a relational database engine in a user's mobile device, the native shell included an SQLite plugin. That plugin allows executing queries to local storage entities similarly to how you'd query a relational database.
But here's the catch. While OutSystems mobile apps always execute inside native shells, developers can create and test their apps on Chrome. So, we needed to provide them with a way to query local storage entities on the browser.
That's when the Web SQL implementation provided by Chrome came on the scene. Although deprecated, OutSystems relied on it to develop its relational solution anyway. Why? Because the primary use of the browser was the development and testing in computers. The app users would interact with the OutSystems mobile apps on their mobile devices instead. Also, Chrome didn't provide an alternative to Web SQL.
Finally, the OutSystems compiler started to dump SQL queries from aggregates, not only for SQL Server and Oracle— server-side — but also for SQLite and Web SQL — client-side. Despite all of them using SQL as a query language, they're different systems. For example, SQLite doesn't support right-joins.
If you're curious enough and want to learn more about this, take a look at this research.
Now it's time to fast forward four years.
Fast Forward to 2020
Since OutSystems 11, the platform includes mobile apps distribution through progressive Web apps (PWAs). PWAs don't require any native capabilities themselves and rely on Web standards to use the device's native capabilities.
Because of PWAs, OutSystems stopped using Chrome solely for development and test purposes. Instead, it started using the browser also to distribute PWAs. And as PWAs don't run on native shells, SQLite was no longer an option. Consequently, we had to start using the databases supported by Chrome, but that was a problem.
The Problem
Remember that Web SQL was already deprecated in Chrome back in 2016? Besides that, Apple dropped it on all versions of iOS since iOS 13, and there's no relational database currently supported by all major browsers.
To tackle this problem, we needed an alternative that still allowed us to store significant amounts of structured data. The choice was IndexedDB, which is a non-relational engine.
That was a huge challenge! To support IndexedDB and keep supporting operations on aggregates — such as join and group by — the OutSystems architects had to design an abstraction layer for the underlying database.
Unlike a relational database, operations that relate data in multiple tables — such as join — are not something that non-relational databases offer out-of-the-box. We had to implement those operations from scratch, including all their flavors, such as inner, right, and left joins. As you can imagine, that demands a great piece of engineering.
The Basics of IndexedDB
Besides being non-relational, IndexedDB is a JavaScript document-oriented database. That means that instead of storing data in tables — as relational databases do — IndexedDB stores data on JavaScript documents. And in those documents, records are JavaScript objects.
Storing data as JavaScript objects is a great advantage of non-relational databases, as it allows more storage flexibility and capacity. However, non-relational databases have a significant drawback: they can’t perform complex queries like join and group by.
To ensure correspondence with all operations on OutSystems, let's consider read and write. The latter is very straightforward to implement, as IndexedDB provides native APIs to put, add, and delete data. As to read operations, IndexedDB only provides the getAll native method. That means that OutSystems generates at compile-time and executes at runtime all the other operations to filter, sort, join, and group data.
Understanding Aggregates
As mentioned before, an aggregate is a visual representation, in OutSystems, of data from a data source. You can use it to create a set of operations to combine and filter those data
The representation of an aggregate is a tree in which:
- Each node — the parent — represents specific operations — or transformations — that the user might perform over each result — or child.
- Each leaf is always a DataSource.
The tree's height must be one to four. For instance, the tree's height is one when you don't need to apply any transformations to the results of querying a data source. On the other hand, the tree's height is four when you need to group results by a specific attribute, sort them, or calculate some attributes.
An aggregate tree can have three types of nodes:
- AddSources: This type of node represents the retrieval of data from a data source. When used alone in an Aggregate, it allows retrieving all the data from the corresponding data source. Also, this node is always a leaf to create an alias for the original name of the data source.
- CombineSources: This type of node represents fundamental transformations that you can perform over a set of results. You cannot use it alone in an aggregate. Plus, it allows you to filter, sort, and add new attributes to results and join data from different data sources.
- GroupBy: This type of node represents more complex transformations. Again, you cannot use it alone in an aggregate. Additionally, it allows you to group results by a specific attribute and execute aggregation functions over those results — sum, count, min, max, and other kinds of aggregation.
By using aggregates, OutSystems supports the non-relational model with its own solution. That means that the OutSystems developers can carry on using the aggregates they already used.
Okay, you already know what IndexedDB and the OutSystems aggregates are. You should be wondering now how to use IndexedDB and yet keep using aggregates. If so, the next section will help you.
From Aggregates to IndexedDB Code
If you add several joins and other conditions to an aggregate, it can get really tricky.
To solve the problem linked to generating IndexedDB code for basic aggregates, we adopted an incremental approach. From there, the compiler code evolved, and the generated IndexedDB code grew in complexity. The IndexedDB API supported the whole process.
To help you understand the generated code, let’s use a simple data model composed of two entities: Country and Employee.
The Country entity is the list of available countries, and it stores their names and the region they belong to — for instance, Europe, the Middle East, and Africa (EMEA). As for Employee, it stores the name, email, and date of birth of employees, as well as the country where they're based.
Step 1: Get All the Data on Employees
To get all the data from the Employee entity, you need the following aggregate:
But how does the OutSystems relational database engine translate that aggregate into SQL? First, it gets the corresponding object store for the Employee entity:
var Employee_ObjectStore = transaction.objectStore("OSUSR_3SY_EMPLOYEE");
Then, you can create a request to fetch all the data available on that store and use the onsuccess callback to process those data.
var Employee_GetRequest = Employee_ObjectStore.getAll();
Employee_GetRequest.onsuccess = function (event) {
event.target.result.map(function (Employee_Value) {
// Add Employee_Value to the final result
result.append(Employee_Value);
});
};
Select *
from Employee
Step 2: Get the Employees' Year of Birth
Now, instead of retrieving the data from the store as-is, let's retrieve the employees' year of birth only. To do so, a CombineSources node filters the employees' data, which results in another aggregate.
The IndexedDB code for this step is similar to the previous one. The only difference is calculating each employee’s year of birth before appending it to the final result.
Employee_GetRequest.onsuccess = function (event) {
event.target.result.map(function (Employee_Value) {
// Calculate the year of birth
resultRecord["birthYearAttr"] = OS.BuiltinFunctions.year(Employee_Value.birthDateAttr);
resultRecord["employeeAttr"] = Employee_Value;
result.append(resultRecord);
});
}
Select Employee.*, Year(BirthDate)
from Employee
Pretty straightforward, huh? However, things change when you start relating entities. That's when the generated IndexedDB code does most of the work.
Step 3: Get the Ages of EMEA-Based Employees
Now, imagine you want to get the ages of all employees based in countries from the EMEA region. It will take three steps:
- Combine all the data from the Employee entity with the data from the Country entity.
- Filter those data by the countries that belong to the EMEA region.
- Calculate the employee’s age for each retrieved result.
You would end up with this aggregate:
But what about fetching the data from both stores and only calculating the year of birth for employees based in the EMEA region? Take one of the following approaches:
- First, load all the data from both stores into memory, and then iterate through all of those data and filter them to get the right ones.
- Use the join condition to fetch filtered data instead of all the data from both stores. This alternative combines the IndexedDB interfaces IDBObjectStore and IDBKeyRange.
The second approach is best due to its efficiency. It retrieves fewer data from the stores and uses native capabilities to filter those data. The IDBKeyRange allows specifying a value to filter an entity by a specific attribute.
In addition to the tailored use of the IndexedDB API, we took one step further. To reduce the number of retrieved records, the order to iterate over entities is critical. Therefore, we optimized that process, as you can read next.
Optimizing Iterations Over Entities
For the example of employees and countries, and to fetch the data, two alternatives are on the table:
- Get data from the Employee entity and keep only those from employees based in the EMEA region.
- Start by retrieving all the data from the Country entity, filter them by region — EMEA in this case —, and then get the data from the corresponding Employee entity.
The second alternative is better than the first as it allows fetching a smaller amount of data. Why? Because data filtering occurs early and at each step of the way by choosing only countries in the EMEA region and, for each country, only employees who live in it.
Such an alternative requires using the aggregates' filters and join conditions and then ordering the sources — filtered sources first. As the engine fetches data, you can apply more filters to those data. The result is an ordered list of sources and filters that you can use to bring data from those sources.
Let's go through the algorithm applied to the running example of employees and countries.
The Iteration Algorithm
It starts with defining three collections:
- The list of all sources
- The set of all filters paired with the sources on which to apply those filters
- The list of sources fetched to that point
Sources: [Country, Employee]
Filters: [(“Employee.CountryId = Country.Id”, [Employee, Country]),
(“Country.Region = ‘EMEA’”, [Country])]
Fetched: []
Given that no sources exist when executing the algorithm for the first time, you can choose any selected filter. And as the second one from the example only depends on the value of one source — Country —, that's the best choice.
By applying the filter, you get data about the current country where each employee is based. So, the source Country goes into the list of fetched sources.
Sources: [Employee]
Filters: [(“Employee.CountryId = Country.Id”, [Employee, Country])]
Fetched: [Country]
Then, you should use the remaining filter. The result is the final list of sources.
(Country, “Country.Region = ‘EMEA’”), (Employee, (“Employee.CountryId = Country.Id”))
The next step is to run the getAll() method on each store following their order. In the process, a filter allows creating an IDBKeyRange to filter the data upon retrieval from the database.
var Country_ObjectStore = transaction.objectStore("OSUSR_3SY_COUNTRY");
var Country_idxValue = "EMEA";
Country_ObjectStore = Country_ObjectStore.index("Region");
var Country_GetRequest = Country_ObjectStore.getAll(IDBKeyRange.only(Country_idxValue));
Country_GetRequest.onsuccess = function (event) {
event.target.result.map(function (Country_Value) {
var Employee_idxValue = Country_Value.idAttr;
Employee_ObjectStore = Employee_ObjectStore.index("CountryId");
var Employee_GetRequest = Employee_ObjectStore.getAll(IDBKeyRange.only(Employee_idxValue));
Employee_GetRequest.onsuccess = function (event) {
event.target.result.map(function (Country_Value) {
// Calculate the year of birth and add it to the result
});
};
})
};
This approach only works for inner joins, which means that every result must match both sources. If you wished to include the employees without a specified country — such as remote workers —, you'd need a left join.
SELECT Employee.*, Country.*, Year(BirthDate)
FROM Employee e inner join Country c on e.CountryId = c.Id
WHERE Country.Region = “EMEA”
In case you're wondering what happens when joining unmatching sources, go ahead and read the next section.
Joins Without a Match
In OutSystems, when a join doesn’t find a match between a record from one entity — the left entity — and the other entity — the right entity —, the resulting record contains the default values for all attributes of the right entity. In other words, the platform adds an empty record to the query result to allow the match of the left entity with data from other sources apart from the right entity.
Despite sounding like a relatively easy thing to do, this approach requires generating code for an arbitrary number of sources. For that, the platform must mix multiple inner and left joins.
To ensure a proper data match with multiple sources, you should create a track record of matches on the right source for each record on a left source. In the running example, this could mean, for instance, to include remote employees in your query result. OutSystems uses the has_Country flag for that purpose.
var Employee_ObjectStore = transaction.objectStore("OSUSR_3SY_EMPLOYEE");
var Employee_GetRequest = Employee_ObjectStore.getAll();
Employee_GetRequest.onsuccess = function (event) {
event.target.result.map(function (Employee_Value) {
var has_Country = false;
var Country_ObjectStore = transaction.objectStore("OSUSR_3SY_COUNTRY");
var Country_idxValue = Employee_Value.countryIdAttr;
Country_ObjectStore = Country_ObjectStore.index("Id");
var Country_GetRequest = Country_ObjectStore.getAll(IDBKeyRange.only(Country_idxValue));
Country_GetRequest.onsuccess = function (event) {
event.target.result.map(function (Country_Value) {
// Calculate the year of birth and add it to the result
has_Country = true;
});
if (!has_Country) {
var Country_Value = new CountryRec();
// Calculate the year of birth and add it to the result
}
};
});
};
SELECT Employee.*, Country.*, Year(BirthDate)
FROM Employee e left join Country c on e.CountryId = c.Id
WHERE Country.Region = “EMEA”
Step 4: Calculate the Average Age of Employees Per Country in EMEA
To implement the GroupBy, create groups of records that share the same grouping attribute value. To do so, add the fetched data to the corresponding group. How?
Index each group of records with a unique key calculated from the attributes used in the GroupBy operation. After executing that operation, update the values required by the aggregation function. In this case, these are the number of employees and their age sum; you need them to calculate the average.
After processing the data, add each group you previously created to the final result of the grouping operation.
var groups = {};
// Fetch countries filtered by Region = "EMEA"
Country_GetRequest.onsuccess = function (event) {
event.target.result.map(function (Country_Value) {
// Fetch Employees filtered by Employee.CountryId = Country.Id
Employee_GetRequest.onsuccess = function (event) {
event.target.result.map(function (Employee_Value) {
// Calculate the year of birth
var key = generateKey({
Name: Country_Value.nameAttr
});
var group = groups[key];
if (!group) {
group = {
nameAttr: Country_Value.nameAttr,
avgCountTemp: 0,
avgSumTemp: 0,
ageAttr: 0
};
groups[key] = group;
}
group.avgCountTemp += 1;
group.avgSumTemp += resultRecord["birthYearAttr"];
group.ageAttr = group.avgSumTemp / group.avgCountTemp;
});
};
});
for (let groupKey in groups) {
result.append(groups[groupKey]);
}
};
SELECT AVG(Year(BirthDate)), Country.Name
FROM Employee e inner join Country c on e.CountryId = c.Id
WHERE Country.Region = “EMEA”
GROUP BY Country.Name
Step 5: Discard countries with an Average Employee Age Over 40
Although the operation is the same as the one you used to join and filter data — CombineSources —, you cannot use the filters while fetching the data. You can only apply them once you have the grouping's final result.
Therefore, to apply the filters, you just need to do some post-processing of the grouping results. And the last operation should be to sort the final results. To do it in descending order, you must negate the result of the compare function.
var groups = {};
var result = []
// Fetch countries filtered by Region = "EMEA"
Country_GetRequest.onsuccess = function (event) {
// Fetch employees for each country
// Calculate their age and update the Average for each Country
for (let groupKey in groups) {
result.append(groups[groupKey]);
}
}
result.filter(function (record) {
return record.ageAttr > 40;
});
result.sort(function (recordA, recordB) {
var res = compare(recordA.ageAttr, recordB.ageAttr);
return -res;
});
SELECT Country.Name, AVG(Year(BirthDate))
FROM Employee e inner join Country c on e.CountryId = c.Id
WHERE Country.Region = “EMEA”
GROUP BY Country.Name
HAVING AVG(Year(BirthDate)) > 40
ORDER BY AVG(Year(BirthDate)) DESC
Before You Go
If you seek inspiration to implement your own non-relational database engine with IndexedDB API, you should mind:
- The performance of your solution
- Its completeness
Consider enrolling in this OutSystems training course on aggregates. You’ll likely love it and learn loads about OutSystems aggregates.