Importing millions of records in Database

Back to Forums

Nitin Chavan

Solved

Question

BPT

Timers

Architecture

Database

Hi,

We have a requirement to import more than 20 million records from CSV file to OutSystems database.

Volume of data is too huge that table contains more than 100 columns and we have to perform insert operation for more than 20 million records.

Could you please suggest me better approach to handle this scenario?

Best Regards,

Nitin

07 Jul 2023

Zafar Sultan

Solution

@Nitin Chavan you can try with following steps.

1. Break the CSV into multiple CSV files.

2. Create a staging table with columns LoanCodeId(Type LoanCodeMaster Identifier, null), LoanCode, LoanAmount

3. For each file, bulk insert records into the staging table(initially, LoanCodeId as null)

4. If there are not many LoanCodeMaster rows, you can just iterate through these rows and update the staging table's LoanCodeId column.

5. If LoanCodeMaster rows are considerably higher, use a join to update the staging table.

6. Once you have all the staging rows ready with correct Id's, move them to the actual table(Loan)

Some points to consider

- Add a flag in the staging table to check if the rows are processed or not.

- Create a simple UI for staging table to check the status and making sure the data matches with the CSV

- Use timers to perform all the above activities.

- Use timer's best practice to avoid timeouts.

See solution in context

07 Jul 2023

3 replies

Last reply 08 Jul 2023

Show thread

Hide thread

Nitin Chavan

Thanks @Zafar Sultan !

Let us try this.

1. Break the CSV into multiple CSV files - We are getting single CSV file(Approx. 20M records) from client. Could you please explain more on this point.

Are you suggesting any forge component for Bulk Insert rows?

08 Jul 2023

Zafar Sultan

Replying to Nitin Chavan's comment on 08 Jul 2023 08:31:51

As the image you posted in your question does not have any identifier for each row in csv file, there are two approaches:

1. Check if you can receive multiple smaller files instead of 1 big file

2. Break this file from your process into multiple files and process each file individually.

The reason for working with batches is better control over processed rows. In case your process fails or there is an exception, you know where to re-start from.

For bulk insert you can use this component.

https://www.outsystems.com/forge/component-overview/1117/bulkinsert

08 Jul 2023

Nitin Chavan

Replying to Zafar Sultan's comment on 08 Jul 2023 14:00:44

Thanks for quick response!

The image I have provided in question is just for explanation purpose. In actual file we have unique identifier present for each row.

08 Jul 2023

Vo Ton Phuc

Do you need to customize data before import or just read the data and directly importing data?

And anyway, you cannot upload too huge volume of data like 20M record of CSV. You need to break down that.

If you don't need to modify (customize) your data before import, you can just use the timer with some trick to bypass 20 default timeouts.

If you need to modify data before import, maybe you can use the Light Process to queue your importing.

07 Jul 2023

3 replies

Last reply 07 Jul 2023

Show thread

Hide thread

Nitin Chavan

@Vo Ton Phuc Thanks for quick response!

We have CSV with 20M rows. We have to covert that data in relational form.

As we have Parent tables already available in DB.

We need to check foreign key value and then we need to insert the row.

Ex. Below we take Loan.csv file as input and insert data in Loan table. We are doing transformation.

Here we need to check the ID of Loan_CD (CSV column) in master table LoanCode.

Thanks!

07 Jul 2023

Vo Ton Phuc

Replying to Nitin Chavan's comment on 07 Jul 2023 06:40:46

Let me confirm a little bit more about this case.

1. Do you need to process this task by OutSystems
2. Which environment that OutSystems is using? OSCloud or Self-Managed?

By the way, you need to access the master and get the ID for Loan Code and map it to Loan_CD in CSV Record before inserting that record to Database.?

07 Jul 2023

Nitin Chavan

Replying to Vo Ton Phuc's comment on 07 Jul 2023 07:34:52

1. Yes we need to perform transformation using OutSystems

2. We are using OSCloud

Loan table I have explained is only example and we have similar scenario but approx

50 columns in table.

07 Jul 2023

Puja Rani

Hi,

You can check below post if that helps:

https://www.outsystems.com/forums/discussion/74006/bulk-upload-of-records-from-excel-file-1million-rows-or-more/
CSV file import utility - https://www.outsystems.com/forge/component-overview/636/csvutil

07 Jul 2023

1 reply

08 Jul 2023

Show thread

Hide thread

Nitin Chavan

Thank @Puja Rani !

08 Jul 2023

Zafar Sultan

Solution

@Nitin Chavan you can try with following steps.

1. Break the CSV into multiple CSV files.

2. Create a staging table with columns LoanCodeId(Type LoanCodeMaster Identifier, null), LoanCode, LoanAmount

3. For each file, bulk insert records into the staging table(initially, LoanCodeId as null)

4. If there are not many LoanCodeMaster rows, you can just iterate through these rows and update the staging table's LoanCodeId column.

5. If LoanCodeMaster rows are considerably higher, use a join to update the staging table.

6. Once you have all the staging rows ready with correct Id's, move them to the actual table(Loan)

Some points to consider

- Add a flag in the staging table to check if the rows are processed or not.

- Create a simple UI for staging table to check the status and making sure the data matches with the CSV

- Use timers to perform all the above activities.

- Use timer's best practice to avoid timeouts.

07 Jul 2023

3 replies

Last reply 08 Jul 2023

Show thread

Hide thread

Nitin Chavan

Thanks @Zafar Sultan !

Let us try this.

1. Break the CSV into multiple CSV files - We are getting single CSV file(Approx. 20M records) from client. Could you please explain more on this point.

Are you suggesting any forge component for Bulk Insert rows?

08 Jul 2023

Zafar Sultan

Replying to Nitin Chavan's comment on 08 Jul 2023 08:31:51

As the image you posted in your question does not have any identifier for each row in csv file, there are two approaches:

1. Check if you can receive multiple smaller files instead of 1 big file

2. Break this file from your process into multiple files and process each file individually.

The reason for working with batches is better control over processed rows. In case your process fails or there is an exception, you know where to re-start from.

For bulk insert you can use this component.

https://www.outsystems.com/forge/component-overview/1117/bulkinsert

08 Jul 2023

Nitin Chavan

Replying to Zafar Sultan's comment on 08 Jul 2023 14:00:44

Thanks for quick response!

The image I have provided in question is just for explanation purpose. In actual file we have unique identifier present for each row.

08 Jul 2023

Damian Fonville

Hi Nitin,

I suggest to write all record to an external datababase, Then write a Timer in OutSystems that wil read for example 10K records each batch, then when a record is processed remove it from the external database, so it won't be processed next batch.

07 Jul 2023

1 reply

08 Jul 2023

Show thread

Hide thread

Nitin Chavan

Thanks @Damian Fonville !

We can use OutSystems database as well, why to write in external database?

Then client need to procure another database.

08 Jul 2023

Tousif Khan

Champion

Hello @Nitin Chavan

The best way I can suggest you is to create a timer, and you need to create an action flow that will process some records at a time asynchronously around 1000 and after iteration of those records you can commit a transaction so that the records that are already processed, won't be processed again.

Example: But for that, you need to create logic or update flags in such a way that the start index of your loop will start from the index where you have new records to process. Also at the start of the action flow add a time to a variable that will insure your action will not be time out and if that time is passed you can wake the timer in a flow.

I am also attaching a link to the article that has explained this in detail.

https://www.outsystems.com/forums/discussion/60880/best-practices-timer-example-of-shown-code/

https://www.osquay.com/knowledge-center/how-to-make-sure-a-heavy-timer-doesnt-misbehave-in-the-outsystems-platform

https://www.slideshare.net/OutSystems/using-processes-and-timers-for-longrunning-asynchronous-tasks

https://www.outsystems.com/training/lesson/1749/odc-2018-heavy-timers

https://www.outsystems.com/forums/discussion/81684/best-solution-of-timer/

I hope this will help you get this sorted.

Best Regards,
Tousif Khan

07 Jul 2023

3 replies

Last reply 17 Jul 2023

Show thread

Hide thread

Nitin Chavan

Thanks @Tousif Khan !

Does following approach looks great?

Create IsSuccess column in staging table to track record is processed successfully.
Create entity to store last processed record Index. Use autogenerated ID column of staging table to track start index of loop.
After timer finishes chunk of 1000 records then update the last ID in Entity and reawake the timer

08 Jul 2023

Tousif Khan

Champion

Replying to Nitin Chavan's comment on 08 Jul 2023 08:52:14

Hello @Nitin Chavan ,

If any of the solution helped you, Mark those as a solution, it would be great for other’s to find the solution for the same.

14 Jul 2023

Nitin Chavan

Replying to Tousif Khan's comment on 14 Jul 2023 21:08:02

Hi @Tousif Khan ,

We have tried this using CSVUtil and BulkInsert Forge component.

We have tried with CSV file with below size-

Records - 1.1 M

Columns - 65

Size - 450 MB

We are doing this operation through timer. We are doing data import in chunk of 10000. Toward the end of the CSV file we can see extreme slowness in CSVUtil.

Please refer below question-

https://www.outsystems.com/forums/discussion/89807/csvutil-csvutil-slowness-for-large-file/

Regards,

Nitin

17 Jul 2023

Community GuidelinesBe kind and respectful, give credit to the original source of content, and search for duplicates before posting.

See the full guidelines