1 Million Records

1 Million Records

  

Hi there, 

If i have 1 million records to process and it is called from automatic activity in BPT, what is the best way to do?

regards, thank you.

@Indo

First recommendation, whether your context is BPT or not, is do not iterate 1 Million records and do an Update. 

For massive updates/inserts you should consider using an Advanced Query.

Something like Update MyEntity Set Is_Active = 0 Where Is_Active = 1, for instance.

Solution

Hello Indo,

Adding to João Heleno answer,

If you need to do complex processing other than just update a lot of records the same way, it's better to make the activity start a timer and when the timer finishes make it signaling to the activity. Like here:

https://success.outsystems.com/Support/Enterprise_Customers/Maintenance_and_Operations/BPT_-_Automatic_Activities_Timeout

One of the reasons is that BPT have a hardcoded timeout of 5 minutes. It was not meant to be used for HEAVY processing.

But even the timer can timeout.

What you do is to stablish a threshold, fetch some records to process, process and mark them as processed. After each record processed, check if the threshold time was reached and if yes, wake the timer and finishes the action. When there is no more records to process, warn the process and finish the timer.

Cheers.
Eduardo Jauch

Solution

Optimal solution real depends on the use case, including the type of processing to apply to each record and the speed required to process them.

Adding to the previous replies, you may also consider launching several process instances or conditional start branches  where an automatic activity would process a specific chunk of data. This way you could increase the processing throughput.


Cheers,

Tiago.  

Thank you all for the reply.

Reading that link, i am still confused. 

1. Each BPT process instance is taken from a pool of process threads.

2. In a BPT there is an activity that processes that million record, so it takes one thread instance.

3. After launch a timer to process the million records, and do wait activity for the timer to finish, isn't it the same case that it is still use on process thread without timer (not using a timer also consume one thread)?

4. Or Wait Activity has a special feature where its thread process that contain it can be used for other BPT while in idle state (not yet time out or the entity action that close it not trigger yet)? 

thank you and regards,




Eduardo Jauch wrote:

Hello Indo,

Adding to João Heleno answer,

If you need to do complex processing other than just update a lot of records the same way, it's better to make the activity start a timer and when the timer finishes make it signaling to the activity. Like here:

https://success.outsystems.com/Support/Enterprise_Customers/Maintenance_and_Operations/BPT_-_Automatic_Activities_Timeout

One of the reasons is that BPT have a hardcoded timeout of 5 minutes. It was not meant to be used for HEAVY processing.

But even the timer can timeout.

What you do is to stablish a threshold, fetch some records to process, process and mark them as processed. After each record processed, check if the threshold time was reached and if yes, wake the timer and finishes the action. When there is no more records to process, warn the process and finish the timer.

Cheers.
Eduardo Jauch


Both an activity and A timer can have time out, so what is the different of using them the same, say: using an activity and then to sure completeness "fetch some records to process, process and mark them as processed"?

I am still confused, say an activity has been time out while the long process in database (say calling a store procedure in a database) is running, what does happen? Does the process in database terminate? Is a new activity instance created..where the start point to be executed? 

regards


Indo wrote:

Both an activity and A timer can have time out, so what is the different of using them the same, say: using an activity and then to sure completeness "fetch some records to process, process and mark them as processed"?

For 2 reasons:

1. The timer timeout can be defined (by default is 20 min), while the timeout of the BPT is 5 min fixed.

2. From inside the action the timer calls, you can wake the timer again, and stop the action. Using this possibility you can implement a solution to keep processing the records until near the timeout, than wake the timer and close the action, what will start the action again to continue processing the records not processed yet. BPT is much heavier and more complicated to do this.

You have also other advantages as well with timers.

Indo wrote:

I am still confused, say an activity has been time out while the long process in database (say calling a store procedure in a database) is running, what does happen? Does the process in database terminate? Is a new activity instance created..where the start point to be executed? 

regards


If I remember well, two things will happen:

1. The transaction will rowback (unless you are doing commits)

2. The activity will be restarted with increasing intervals

Cheers.


"The transaction will rollback (unless you are doing commits)"

Is it correct when i call a store procedure in sql advance widget, then if time out, the process in the database is also stopped? So, there will be communication from the OS to the database to tell the database that it is timeout and do rollback (unless it is committed already in the database)?

regards,


Eduardo Jauch wrote:


1. The timer timeout can be defined (by default is 20 min), while the timeout of the BPT is 5 min fixed.



@Eduardo et al

The 5 minutes applies only to Automatic Activities. 

Indo wrote:

Thank you all for the reply.

Reading that link, i am still confused. 

1. Each BPT process instance is taken from a pool of process threads.

2. In a BPT there is an activity that processes that million record, so it takes one thread instance.

3. After launch a timer to process the million records, and do wait activity for the timer to finish, isn't it the same case that it is still use on process thread without timer (not using a timer also consume one thread)?

4. Or Wait Activity has a special feature where its thread process that contain it can be used for other BPT while in idle state (not yet time out or the entity action that close it not trigger yet)? 

thank you and regards,





hi, i drop twos, and this is not answered yet.

thank you.

João Heleno wrote:

Eduardo Jauch wrote:


1. The timer timeout can be defined (by default is 20 min), while the timeout of the BPT is 5 min fixed.



@Eduardo et al

The 5 minutes applies only to Automatic Activities. 

@Joao

What is the timeout of the callback actions in BPT?


Indo wrote:

"The transaction will rollback (unless you are doing commits)"

Is it correct when i call a store procedure in sql advance widget, then if time out, the process in the database is also stopped? So, there will be communication from the OS to the database to tell the database that it is timeout and do rollback (unless it is committed already in the database)?

regards,


AFAIK, unless the sp commits the data, the transaction is started by the platform and when the sp does not return until timeout, the platform will abort the transaction.

Cheers

Eduardo Jauch wrote:

@Joao

What is the timeout of the callback actions in BPT?

Must dig that one up... but nothing prevents a BPT process from running more than 5 minutes. If, for instance, a BPT process has two sequential Automatic Activities (with a timeout of 5min), this process will (in theory) be able to run 10 minutes straight.


Indo wrote:

Hi there, 

If i have 1 million records to process and it is called from automatic activity in BPT, what is the best way to do?

regards, thank you.


Hi Indo, As Joao said I would not iterate through 1 million records if at all possible. Your actual solution will depend on what you are actually trying to do with the records.

For example in your code above you are only using the assign operator so presumably you are just modifying a property in the record. If that is the case then you could try creating a manual or derived field within the aggregate itself to update the value. Alternatively an advanced SQL query would give you ability to do switch or if statements directly in the SQL field.

If you were instead trying to update the values in the entity then an advanced SQL query using the Update statement would be a good way.

Or if you really need to iterate through every record and do something then you could look at doing it in batches and calling a separate Activity for each batch, that way each batch has 5 minutes to run.

If you could give a little more detail on exactly what you are trying to do, or if you want to run a couple of specific scenarios, then might be able to give a more succinct answer.


João Heleno wrote:

Eduardo Jauch wrote:

@Joao

What is the timeout of the callback actions in BPT?

Must dig that one up... but nothing prevents a BPT process from running more than 5 minutes. If, for instance, a BPT process has two sequential Automatic Activities (with a timeout of 5min), this process will (in theory) be able to run 10 minutes straight.


Ah! Yes, of course. When I said BPT, I was not meaning "process", as you have waits, human activities, multiple automatic activities each with 5 min Each, etc.

But I am really interested in know what is the timeout of the callback actions...

Cheers