Faster way to find records in an hierarchical data structure?

Back to Forums

Willem Schulte

Solved

Question

Performance

Hello colleagues,

I have a performance question.

I have a department table where there is a department id and a 'belongs to' department id. In this manner i'm able to create 'in theory' infinite sub departments in one table.

Now i need to get the sales records of a department and all of its sub departments accumulated to the initial department.

I created an 'GetDepartmentAndChildren' action to get a list of departments that hang of any given department.

so for instance a structure like this:

Main Company

? Department1

?Department2

?Subdepartment1

?SubSubdepartment1

?Subdepartment2

And so on.

I found a solution to get the sales for an entire tree, but its slow (involves an index search on a department names string in an aggregate)

What i do:

I get all departments in the tree in a list via the 'GetDepartmentAndChildren' And add them enclosed in pipe signs in a search string.

So i get something like:

"|Subdepartment1||Subdepartment2"

(The Pipesign is there to prevent partial department names generating a false positive)

Now in the Aggregate filter I use something like

SyntaxEditor Code Snippet

Index(DeparmentNamesSearch,"|" + Sales.Department + "|") <> -1

As said, this works but it's a bit slow (having many thousands of records for each department to wrestle through).

Have been thinking about creating a function that returns a ListIndexOf value but I also read somewhere that functions in an aggregate (especially of huge tables) are ill advised.

Did anyone encounter this issue and found a faster method to achieve the above mentioned?

Any help would be appreciated!

Regards,

Willem

04 Jan 2019

Daniël Kuhlmann

MVP

Solution

Hi Willem,

You could define an DepartmentCode attribute on you department entity that contains hierarchical string code that is unique for a department (make sure to make an index on it).

The DepartmentCode should be automatically generated and maintained by your (in your department CRUD wrappers, the code could be derived from id generated by OutSystems or another other unique key).

It's string looks something like this:

Id Code

1 "00001"

2 "00001.00002"

3 "00001.00002.00003"

4 "00001.00002.00004"

5 "00001.00005"

6 "00001.00005.00006"

7 "00001.00005.00007"

This code can be used to easily query the database for all the children of an department (recursively).

So if you filter on "00001.%" you would retrieve department with code 00001 and all its sub departments for all levels.

If you filter on "00001.00002.%" you would retrieve department with code "00002" and all its sub departments.

There are some rules for this code:

The code must be unique
All the children of the same department have codes that start with the parent department's code.
It's fixed length and based on the level of the department levels in the tree, the example shown allows for 3 levels deep.
you could create a site property that defines how deep the hierarchy can go.
While the department code is unique, it can be changeable if you move an department.
You must reference an department by Id, not by Department ode.

Hope this helps,

Regards,

Daniel

05 Jan 2019

Willem Schulte

Hi Daniel,

Excellent Idea! I rebuilt the preparation and added a 'hierarkey' to the departments. The load time of the screen went from > 10 seconds to < 2 seconds.

a few things I changed from your input are:

I created a key from a part of the GUID of the higher level department in combination with the milliseconds of the moment of creation of the key. This ensures a 99.99999% change on a unique key.

Was thinking about using the entire GUID as a key, but that would result in enormous key strings in the deeper levels.

Should a duplicate key ever occur, I catch the error of creating it and try again.

I don't see a reason for your fixed length rule. It will also work if the length would be variable.

Using your example a select on "00001.lalevel2.23.%" would work just fine.

I built the generation of the key into the save action of the department and just regenerate the entire department tree keys (This only takes half a second) since the departments don't change that often.

So thanks for the idea!

Regards,

Willem.

06 Jan 2019

Daniël Kuhlmann

MVP

Hi Wilem,

Great to read that my idea did help to improve the performance.

Regards,

Daniel

06 Jan 2019

Kilian Croese

MVP

When working with hierarchical data structure in relational databases I can highly recommend to study the following two presentations by Bill Karwin the author of the book "SQL Antipatterns: Avoiding the Pitfalls of Database Programming":

Practical Object Oriented Models In Sql

This presentation is relevant for everybody working with object Oriented Models in SQL, slides 39-69 talk about working with trees/hierarchies. It covers the following design patterns, with their pro's and cons:

Adjacency list
Path enumeration (proposed above)
Nested sets
Closure tables

On slide 69 (see below) you have a nice overview of all the patterns with there pro's and con's. Note that you can also use multiple patterns at the same time.

You can also have a look at the following presentation. This covers pretty much the same content as the presentation about, but is only focused on hierarchical data and ends with an example using PHP:

Models for hierarchical data

In the past I have use a combination of primarily the closure table design, combined with adjacency list and path enumeration. The addition of adjacency list was, to make it easier (and faster) to query the child's of a node. We used path enumeration because the sorting of nodes was important. The downside of path enumeration is that the depth of your tree is limited by the amount of characters you have available to store the path.

Some of the databases have database specific support for working with hierarchies, which is something I would not directly recommend to use.

07 Jan 2019

Community GuidelinesBe kind and respectful, give credit to the original source of content, and search for duplicates before posting.

See the full guidelines