27
Views
13
Comments
[BulkInsert] Duplicate Records When Running Multiple Timers with BulkInsert
bulkinsert
Service icon
Forge asset by Galter
Application Type
Reactive

I am facing an issue when running three timers in parallel in OutSystems. Each timer reads a different CSV file and uses BulkInsert to insert data into the same database table.

Although the CSV files do not contain any duplicate records, I am seeing duplicate rows in the target table after the process completes.

Here are some details:


Each timer processes its own file and commits after each batch.

There is no overlap in the data between the files.

The table currently does not have a UNIQUE constraint.

2019-01-07 16-04-16
Siya
 
MVP

@XuyenHoang :  You mentioned that "each timer processes its own file and commits after each batch".  One possible cause of duplicate records is that if an exception occurs after a few batches, the timer automatically retries the execution. This could result in duplicate insertions because it starts processing from the beginning again. 

2023-10-16 05-50-48
Shingo Lam

It makes sense to follow the timer best practice that there should be a flag to indicate the record is processed. Then if the timer fails and retries, it wont try to insert the processed records again.

Moreover, please add the timer wakeup logic to make sure it won't get stuck by the timeout exception.

Hope this help.

Thanks and best regards,

Shingo Lam

2024-03-23 09-39-45
XuyenHoang


Hi Shingo,

Thank you for sharing the best practices. I’ve already implemented a mechanism to manage files using a dedicated table, where each file has a status for success or failure. The timer only picks up files that have not been processed yet, so we avoid re-inserting completed records.

Additionally, I’m controlling the timer’s timeout behavior. If it approaches the timeout limit, the timer will be rewoken to prevent it from getting stuck.

The main challenge I’m facing now is when running three files in parallel and performing bulk inserts into the same table. This is causing issues as mentioned in my previous QA update, and I’m currently working on resolving that.

Thanks & Best regards,
Xuyen Hoang

2023-10-16 05-50-48
Shingo Lam

Its great to hear that the best practice is applied.

But oops, does it make sense to keep the table at file level? how can u manage whether the record in the file existed or not? but never mind, focus on the issue first

There are some approaches that we can check for this

  1. The default timeout of the bulk insert execution is around 300s for SQL and MySQL, 90s for Oracle, will your file exceeds this limit?
  2. Do you turn the useTableLock on? it can make the insert faster
  3. You can debug what are the typical inputs passed in the BulkInsert action?

Hope this help

Shingo Lam

2019-01-07 16-04-16
Siya
 
MVP

@XuyenHoang : You mentioned "three files in parallel". Could you please clarify what you mean? Do you have three separate Timers (e.g., Timer A, Timer B, Timer C)? In OutSystems, a single Timer cannot run multiple instances in parallel.


2024-03-23 09-39-45
XuyenHoang

Hi @Siya,

I have three separate Timers (A, B, and C), and they run simultaneously. Each Timer processes a different CSV file and uses BulkInsert to insert the data into the same table.

2019-01-07 16-04-16
Siya
 
MVP

Thank you for the clarification @XuyenHoang. Could you please put a Log inside the server action processing the file - i.e name of the Timer + name of the file so that you will know if two timers picking up the same file at the same time. 

Earlier you mentioned that "the timer only picks up files that have not been processed yet". How are you ensuring that a file is considered "not processed" ? Do you have a status column with values such as Not Processed, Success, and Failure, and the timer selects only the records marked Not Processed?

If so, what prevents multiple timers from picking the same file when they start at the same time? For example, if three timers trigger concurrently, all three could potentially select the same unprocessed file.

What mechanism have you implemented to avoid this race condition?

2024-03-23 09-39-45
XuyenHoang


I manage files via a control table that tracks status values: Not Processed, Success, and Failure. When Timers A, B, and C start, each retrieves a distinct file. This is enforced by an Oracle SQL query using FOR UPDATE SKIP LOCKED, which locks the record as soon as it is selected, ensuring that no two timers can acquire the same file concurrently.

2019-01-07 16-04-16
Siya
 
MVP

Thanks for the details; now it’s clear that no two timers will pick the same record.

Focusing on a single timer: if a file has 50K records and the timer has already inserted 20K when it’s about to expire and be retriggered, how do you ensure that the next run continues from record 20,001 instead of restarting from the beginning and inserting duplicates?

2023-10-16 05-50-48
Shingo Lam

So it means that you store the file in db, then base on the status to pick it out. I am curious that 

  • Timer A pick file 1 to import
  • When timer A is not completed, the file 1's status is still Not Processed, obviously timer B will pick file 1, too
  • Same for timer C

I recommend adding 1 more status like "Processing". Whenever a timer pick a file, update it to "Processing".

Hope this help

2024-03-23 09-39-45
XuyenHoang

In the file management table, we store the S3 file key and maintain 4 statuses:

Processing: when the file management record is created

InProgress: when the timer starts processing the file

Complete: when the file is successfully inserted

Fail: when the insert operation fails

2024-03-23 09-39-45
XuyenHoang


Each file only contains 10K records, and after every insert operation we immediately commit the changes. We’ve also implemented controls to handle timer timeouts, so the process won’t restart from the beginning or create duplicates

UserImage.jpg
Amit Jat

Hi @XuyenHoang 

Please follow the below steps I hope it will work for you

  • Identify the columns that define a unique record (natural key).

  • Deduplicate current data using ROW_NUMBER() and remove true duplicates.

  • Add a UNIQUE index/constraint on those columns.

  • Switch timers to BulkInsert → staging → MERGE/UPSERT into final table (or use ON CONFLICT DO NOTHING).

  • Test with multiple parallel timers in a dev environment.

Community GuidelinesBe kind and respectful, give credit to the original source of content, and search for duplicates before posting.