Big difference between query took time and node's execution time

Hello,

I'm facing strange behavior with advanced sql tool.
My action flow is below.

The following are real sample of General Log in Service Center.

2019-11-19 17:29:41  LogMessage2                                                Debug
2019-11-19 17:29:09  Query FetchKoujis.GetKoujisByEigyousyos.List took 2922 ms  SLOWSQL
2019-11-19 17:29:09  LogMessage                                                 Debug

Why does it take 32s(LONG!!) between LogMessage and LogMessage2 even though the 'Query took' is displayed as 2.9s(2922ms)?
How do I speed up the execution of this node?

The data returned by the sql node has only 140 rows and 10 columns.
Columns consists of string, date and date. Longest string columns are under 200 characters.
The tables using in the sql are on external Oracle 11g.

Other environment info:
* Platform Server 11.0.607.0
* Windows Server 2016

Thanks.

Hi Takasi,

The LogMessage action is asynchronous - there's no guarantee of when the data is written to the Log Entity. It's not your code that's taking so long, it's just the message queue being processed.

Instead of relying on the time of write to the General tab, you could write a CurrDateTime() value into each log and use that for comparison.

Hello Afonso,

Thank you for comment.

I know the General Log doesn't record correct time.
I also attempted to record correct time by using function or another SQL snippet as you wrote.
But I the action's execution time is really more than 30s. Also on my wrist watch.

Why does it take so long time even though the query took only 2.9s?
How do I improve the performance of action?

Performance can be tricky to diagnose, but I see three main issues here:

 - you say you've measured more than 30s, but does it take 30s to execute, or for you as a user to see a response in the browser? You should separate execution time from completion time - I'd measure this with a local variable initialized with CurrDateTime(), and then logging this time right after your Start node. Then, right before the End node, I'd add another log that subtracts this local variable with a new CurrDateTime(), so you know exactly how long it takes the code to execute - this allows you to exclude network issues and lets you isolate performance to just the code within the action;

 - you have a query that takes almost 3 seconds to return 100 records. This is worrying - is it complex? Do you perform a lot of calculations? I see that it's external, so I'd ask if 3 seconds is "normal". Can you run the query directly in the database to compare?

 - if you have 30 seconds of total execution time and you know the query takes 3, that's 27 seconds unaccounted for. Not knowing what happens in the initial If node (are you using functions that could be obfuscating additional logic?), or what happens in the String Join action, the remaining time could be being eaten up by other logic.

I think the main priority is to exclude network issues (by using logs at the start and at the end of the Action), and should you confirm that it's not related to the network, trying to measure how long it takes to execute each separate step of your logic.

Hello Afonso,

Thank you for using your time for my problem.

You should separate execution time from completion time

I changed my action as below. All LogMessage nodes use CurrDateTime().

And obtained the following log.

2019-11-20 09:39:46 LogMessage3 - 2019-11-20 09:39:46, (43s)                  Debug
2019-11-20 09:39:46 LogMessage2 - 2019-11-20 09:39:46   Debug
2019-11-20 09:39:03 Query FetchKoujis.GetKoujisByEigyousyos.List took 5250 ms SLOWSQL
2019-11-20 09:39:03 LogMessage1 - 2019-11-20 09:39:03                         Debug
2019-11-20 09:39:03 LogMessage0 - 2019-11-20 09:39:03                         Debug
I see that it's external, so I'd ask if 3 seconds is "normal". Can you run the query directly in the database to compare?

Yes, it is normal. This query is complex a little. Few days ago, executing this query using sqlplus in console directly took about 3 thru 10 seconds. (And I can not currrently perform this again, unfortunately.)

Not knowing what happens in the initial If node (are you using functions that could be obfuscating additional logic?), or what happens in the String Join action, the remaining time could be being eaten up by other logic.

The log I pasted above shows only SQL node takes almost all time.

Is there another action I should try?

Hi Takasi,

You can use Lifetime Analytics to look into more detail to that specific action and query. This will give you more insight in where the delay is and whether it might be network related (Slow connection to an external DB Server?).

Regards,
Lennart

It's strange - there's something else taking up the time in between LogMessage1 and LogMessage2, but all that's there is the query and that's only taking up a visible fraction (5 seconds out of 40).

Can you Test the query? How long does it take to return results?

Lennart's suggestion is solid as well - do you have access to LifeTime? The Analytics tab could provide insight into what's going on.

Hello Lennart,  
Hello Afonso,

Thank you for comments.

Unfortunately, the environment doesn't have LifeTime currently.
Yes, it's strange.

My colleague came up with an idea that could explain this phenomenon.
The Query Took time in the log is probably the time until the first data can be returned, not the time until the node finished to obtain all the data.

For example, when using the following query, the external Oracle will return first row in short time period. And long time is needed to finish returning all data. The Query Took time in the log maybe indicate the first time.

select    -+
    ...    |
from       | <simple query (fast)>
    ...   -+

union all

select    -+
    ...    |
from       | <complex query (slow)>
    ...   -+

I've always been led to believe that it's the full time for the query execution, but I'm not too familiar with Oracle - it could be possible.

Do you suppose you could extract and isolate this logic into a new espace and try reproducing it there? Worst case scenario, if you cannot draw any further conclusions from this you'll have the problem contained/reproduced and could contact OutSystems support for their suggestions.

Hello Afonso,

Sorry for my delay.
Thank you for commenting.
I asked to OutSystems support.
When I will obtain any information from them, I will share it with you here.

OutSystems' support have been investigating this issue and I haven't received their reply yet.
By the way, I have become to be able to reproduce the issue on Personal Environment.
My colleague's hypothesis is probably correct.
Attached oml file can reproduce the problem. Screenshots are from the oml.

Thank you for the update and the sample Takasi - I ran it in my personal environment and got very similar times in the log. Testing the SQL query also took around 8 seconds, matching those times very closely - looks like your colleague might be right.

Hopefully support can confirm this.

Tracking number RPD-4719 addressing to this issue was created.
Some comments by support were as following.

When a query is executed by the platform the start time is saved and soon as the query finishes its execution in the database, even if all the dataset is still not returned, the end time is also saved and the difference is logged in a SLOWSQL.

As an example, with your query, the execution time with all the dataset already returned is 7 seconds, however, the platform will only obtain the results when needed. Therefore while the results are being fetched from the database, since the query was already executed, a pointer to the dataset is returned from the database and consequently the SLOWSQL is generated with that time difference.

That said and specifically addressing your latest question, we would like to inform you that we fully understand your specific use case and we find your concerns valid. Therefore, we have opened a new defect, with an improvement suggestion, to be analyzed internally by our development team, which will be done according to backlog and severity.

As of now, we do not know when and if these improvements will be applied to our product, but you can track the ID RPD-4719 in our "Release Notes".

Regarding your current situation, the only way we found to troubleshoot any slow database nodes in your logic – either Aggregates or Advanced SQL – would be by adding the additional logging to your applications’ logic, as explained in our previous communication.