Huge Performance problem in OutSystems

I notice the ESPACE_VERSION system table has a BLOB field for the OML_FILE. Our OML files are 7+ megabytes each and our version table has 2,500+ rows in it for just ONE espace. That module alone consumes well over 20GB of data.

Each time an Aggregate hits that table, it brings back GIGABYTES of data.

a) why is this blob being stored IN the database row? Why not on the file system somewhere with a link to it? Do we ever need to Query on this field? This seems like an incredible waste of space.

b) How can we create an aggregate on a database table with a BLOB field and tell the SQL statement NOT to include the blob field?


I think this is leading to timeouts in our publishing, loading modules, references, etc. 


Let's take the DBCleaner in the Forge as an example. There is an aggregate in PruneVersionsByEspace that looks roughly like this:

That first 1 is hiding this:

Notice the "Fetching data...." when the OML_FILE field is hidden, the data is fetched in a couple seconds. With it exposed, it takes dozens of seconds.

Unfortunately, regardless of it being hidden or not, the SQL that goes to the database looks like this:

This means that each time we do something in OutSystems that hits that table, we are sending potentially GIGABYTES of data across the network.

No wonder it is slow! It is taking us upwards of 10 minutes to publish one module.

Of course you'll say, "Delete Old espace versions..." and yes, that'd be nice, except:

Why? gigabytes over the wire...



This should be fixed at the OutSystems level, IMO.

First thing is WHY are you accessing that table? That table contains a copy of the different versions of the actual espace OML code and I can't really see a reason why you would be accessing it directly in normal opperations.

Next, if you follow the Outsystems best practices for performance (see https://www.outsystems.com/learn/courses/46/master-class-on-performance-and-troubleshooting/ masterclass) then Outsystems automatically excludes unused fields from an aggregate so should not actually fetch that data unless you specifically reference it or you are doing one of the other actions that stops Outsystems from being able to optimize the aggregate (ie passing the results as a list to an action, referencing an aggregate from a Preparation from within a screen action without doing an explicit refresh etc (see the masterclass for more details)


WHY am I accessing that table? To make OutSystems not take 20 minutes to publish a module! I'm not even talking about OUR code here, I'm talking about OutSystems code. Unfortunately, I can't performance optimize Service Center such that it doesn't timeout when I try to "Publish all Consumers."

The code I'm looking at is in a Forge Component a very nice developer made to help customers of OutSystems keep their OutSystems databases performing well enough that OutSystems itself doesn't crash:

https://www.outsystems.com/forge/component-overview/423/dbcleaner

It seems crazy to me that we should have to do this kind of stuff anyway, yeah... this shouldn't be what we have to do on a NORMAL basis, but unfortunately, we do.

Why is it the first post in response to so many forum issues here basically equates to: You're doing it wrong!

I don't think we're doing it wrong. OutSystems is doing it wrong. Putting a 7MB blob in a database is WRONG.




Seriously??

The master class you linked to (thank you for that) says:

However, OutSystems ITSELF isn't doing that. This is the problem we are experiencing. They also aren't archiving their old versions into another table like "espace_version_archive" so that only the 100 most recent versions are stored in the ESPACE_VERSION table.

Let's assume it is necessary to store the OML_FILE in the db, then it should be like this:

ESPACE_VERSION -> ESPACE_VERSION_OML_FILE (id,espace_version_id,oml_file)

so when the aggregate gets all the data from ESPACE_VERSION, the OML_FILE is left on the server and doesn't clog the pipe.

What does this mean?


The narrator said column selection can't be optimized for columns retrieved, therefore ALL columns are retrieved. What do we do instead? It says what not to do, but not what TO do.

OMG, we use AJAX everywhere! Why is AJAX a problem?


Well, I did learn a few things from the performance master class, but unfortunately, nothing that talks about how to speed up 1-Click Publishing or managing ESPACE_VERSION.



As per the performance optimization master class, i'm updating the aggregate in DBCleaner to pull back only the ESPACE_VERSION.ID field instead of the OML_FILE with it and I get this:

aaaahhhH!!!!

*** PULLS HAIR OUT  ****

As long as the entity is added as a reference you should be able to use the entity virtual name instead of the physical name, ie select * from {Espace_Version} instead of select * from ossys_entity_version. They made a change a while back to stop you from referencing system tables that weren't directly linked to your escape.


BTW, I do agree about the size of the espace_version table, ours got to over 40 g before we started doing regular dbcleans, now it sits at around 13gb after cleans. But I do see why they do it, basically the system is designed to live totally from the database and everything is copied from the DB to the individual servers as required. Makes it easy to manage and backup but not exactly good DB practice (I'm an MSSQL DBA and had a small fit when i noticed what they were doing lol )

That being said we haven't found too much of a performance problem due to the versions. Most of the time we have had long publishes has been due to poor architecture where making a change to one espaces causes a huge number of different espaces to be rebuilt. By following the 4 layer architecture properly we have been able to cut our publishing times by well over half. Unfortunately there are still some very low level espaces that prompt a large rebuild when doing something as simple as changing a shared static entity. Can't wait until we can upgrade to V11 as I believe that particular issue has been fixed.

I'm throwing a fit right now!! :)

Thank you again! I updated that Aggregate to an advanced sql statement and it brings the records back fast. Per your advice, it looks like this now:


For copy/paste if anyone else wants to try to hurt themselves:

SELECT {Espace_Version}.[Id] 
FROM {Espace} 
    Inner JOIN {Espace_Version}  ON {Espace_Version}.[eSpace_Id] = {Espace}.[Id]
    Left JOIN {Module} ON {Espace}.[Id] = {Module}.[Espace_Id]
    Left JOIN {Solution_Version_Reference} ON {Espace_Version}.[Id] = {Solution_Version_Reference}.[Espace_Version_Id]
    Left JOIN {APP_VERSION_MODULE_VERSION} ON {Espace_Version}.[Id] = {APP_VERSION_MODULE_VERSION}.[ESPACE_VERSION_ID]
WHERE 
{Espace_Version}.[eSpace_Id] = @Espace_Id AND {Espace_Version}.[eSpace_Id] IS NOT NULL
AND {Espace_Version}.[Id] <> {Espace}.[Version_Id] AND {Espace_Version}.[Uploaded_Date] < @OlderThan 
AND {Solution_Version_Reference}.[Espace_Version_Id] IS NULL AND {APP_VERSION_MODULE_VERSION}.[ESPACE_VERSION_ID] IS NULL

Hi PJ ,


You are right, that entity does not follow the best practices and it should. It's a legacy problem that predates the best practices and is hard to solve without a breaking change on the public usages of that entity. Which we have been trying to avoid. 


Did you submit any feedback regarding the problems with the Publish All Consumers?


Also regarding that query taking time, is it slow on the runtime of the application or just on the test query? Because in test query does not do attributes optimizations (but I didn't open DBCleaner to see if the if the output of the query was used in a way that allowed optimizations).

In that case wouldn't it suffice (for runtime) to add a Group By on the Id attribute instead of converting to an Advanced Sql?


Regards,

João Rosado

Thanks for the straight forward answer. I can imagine it's hard to pull that piece out.


The performance problems may have been somewhat related to the espace_versions, but cleaning those didn't fix our problem. We were talking about it with OutSystems Support simultaneously.

Turned out the bigger culprit was a bunch of open Activities and Business Processes.