17
Views
5
Comments
Best performance option for an aggregate with advanced filters

Hi, I am developing a screen that will obtain data from an aggregate with many joins in order to give the user the option of making complex filters. These joins have a 1-N relationship in some cases (for example, a project - N technologies, or a project - N workers), so I get duplicate rows, since although I need the join for the filter, the user only I show him fields from the first table. 

I was thinking about the best performance option in these cases. Following the Outsystem instructions I could use a group by, but in SQL it would be less efficient than a DISTINCT, which would force me to pass it to a query and remove the aggregate. How do you solve these types of cases?

I do not have much experience in Outsystem so maybe there are options that I do not know

Rank: #67

Hi CrisSanz,


Where did you see DISTINCT is more performant than GROUP BY?

Actually, I think it is the other way around as you can see in many articles and analysis online, for instance, this one makes a thorough analysis on the comparison between both, concluding that DISTINCT is less performant than GROUP BY.


From an aggregate standpoint, I think there are few things to take into account from a performance standpoint:


1. Don't join tables you don't need - less is more

2. Do WITH joins when possible (INNER JOIN) rather than WITH OR WITHOUT (LEFT or RIGHT JOIN)

3. Make sure that when joining tables you're doing so with keys or indexed fields.

4. Use MaxRecords if you can.

5. Make sure your filters and order fields are also indexed.


Hope it helps.


Regards,

João

Rank: #4751

That is the debate. Perhaps in Outsystem, with the specialization of the aggregates is as you say and it is correct to have complex aggregate with JOINS 1-N and filter duplicates by GROUP BY

In SQL Server the purpose of GROUP BY is not a clause to remove duplicates, it is to divide the result of the query into groups of rows.

As far as I understand DISTINCT works like this:

  • Copy all business_key values from business_key to a temporary table
  • Sort the temporary table
  • Scan the temp table, returning each item that is different from the previous one

The group by could be executed as:

  • Scan the entire table, storing each business key value in a hash table
  • Returns the keys of the hash table

The first method is optimized for memory usage - it would still work reasonably well when part of the temp table needs to be swapped. The second method optimizes speed, but potentially requires a large amount of memory if there are many different keys.

Rank: #67

The aggregate are just a visual representation of an SQL query which is going to run in the database so you should not take the representation, or the time it takes to represent the results, on Service Studio results rendering as the time it takes for the aggregate query to execute in run-time.


Given that, evidences point to the fact the GROUP BY is faster as you can see in this comparison taken from the article shared above:


In any case, there is no better evidence than you comparing the execution plan of both approaches in SQL Server Management Studio and taking your own conclusions.
If you are working on-premises, you should have this access or contact your DBA to have that access and run this experiment. If you are working on an OutSystems Cloud infrastructure, you can ask for these (reading) permissions to the database opening a support case with OutSystems.


Regards,
João

Rank: #4751

Thanks for the note, I will ask my BDA administrator to give me permission to see the execution plan and work on the query (we currently have a performance problem)

Rank: #67

If you are already struggling with performance issues, please consider also the bullets I mentioned on the first comment which may be adding to the duration of the query more than the group by vs distinct option.

If you are also joining tables with multiple fields make sure you have the appropriate composite indexes created on those fields and that they are applied in the same order - it makes the whole difference! You can read in this article what to bear in mind when creating those indexes and the impact it makes. 


Regards,
João