336
Views
16
Comments
Solved
Importing millions of records in Database

Hi,

We have a requirement to import more than 20 million records from CSV file to OutSystems database.

Volume of data is too huge that table contains more than 100 columns and we have to perform insert operation for more than 20 million records.

Could you please suggest me better approach to handle this scenario?

Best Regards,

Nitin


2022-11-15 12-05-50
Zafar Sultan
Solution

@Nitin Chavan you can try with following steps.

1. Break the CSV into multiple CSV files.

2. Create a staging table with columns LoanCodeId(Type LoanCodeMaster Identifier, null), LoanCode, LoanAmount

3. For each file, bulk insert records into the staging table(initially, LoanCodeId as null)

4. If there are not many LoanCodeMaster rows, you can just iterate through these rows and update the staging table's LoanCodeId column.

5. If LoanCodeMaster  rows are considerably higher, use a join to update the staging table.

6. Once you have all the staging rows ready with correct Id's, move them to the actual table(Loan)

Some points to consider

- Add a flag in the staging table to check if the rows are processed or not.

- Create a simple UI for staging table to check the status and making sure the data matches with the CSV

- Use timers to perform all the above activities.

- Use timer's best practice to avoid timeouts.

2022-08-26 11-04-22
Nitin Chavan

Thanks @Zafar Sultan !

Let us try this.

1. Break the CSV into multiple CSV files - We are getting single CSV file(Approx. 20M records) from client. Could you please explain more on this point.

Are you suggesting any forge component for Bulk Insert rows?

2022-11-15 12-05-50
Zafar Sultan

As the image you posted in your question does not have any identifier for each row in csv file, there are two approaches:

1. Check if you can receive multiple smaller files instead of 1 big file 

2. Break this file from your process into multiple files and process each file individually. 

The reason for working with batches is better control over processed rows. In case your process fails or there is an exception, you know where to re-start from.

For bulk insert you can use this component.

https://www.outsystems.com/forge/component-overview/1117/bulkinsert

2022-08-26 11-04-22
Nitin Chavan

 Thanks for quick response!

The image I have provided in question is just for explanation purpose. In actual file we have unique identifier present for each row. 

2021-10-07 07-03-31
Vo Ton Phuc

Do you need to customize data before import or just read the data and directly importing data?

And anyway, you cannot upload too huge volume of data like 20M record of CSV. You need to break down that. 

If you don't need to modify (customize) your data before import, you can just use the timer with some trick to bypass 20 default timeouts.

If you need to modify data before import, maybe you can use the Light Process to queue your importing.

2022-08-26 11-04-22
Nitin Chavan

@Vo Ton Phuc Thanks for quick response!

We have CSV with 20M rows. We have to covert that data in relational form.

As we have Parent tables already available in DB.

We need to check foreign key value and then we need to insert the row.

Ex. Below we take Loan.csv file as input and insert data in Loan table. We are doing transformation.

Here we need to check the ID of Loan_CD (CSV column) in master table LoanCode.

Thanks!

2021-10-07 07-03-31
Vo Ton Phuc

Let me confirm a little bit more about this case.

1. Do you need to process this task by OutSystems
2. Which environment that OutSystems is using? OSCloud or Self-Managed?

By the way, you need to access the master and get the ID for Loan Code and map it to Loan_CD in CSV Record before inserting that record to Database.?

2022-08-26 11-04-22
Nitin Chavan

1. Yes we need to perform transformation using OutSystems

2. We are using OSCloud 

Loan table I have explained is only example and we have similar scenario but approx 

50 columns in table.

UserImage.jpg
Puja Rani

Hi,

You can check below post if that helps:

2022-08-26 11-04-22
Nitin Chavan

Thank @Puja Rani !

2022-11-15 12-05-50
Zafar Sultan
Solution

@Nitin Chavan you can try with following steps.

1. Break the CSV into multiple CSV files.

2. Create a staging table with columns LoanCodeId(Type LoanCodeMaster Identifier, null), LoanCode, LoanAmount

3. For each file, bulk insert records into the staging table(initially, LoanCodeId as null)

4. If there are not many LoanCodeMaster rows, you can just iterate through these rows and update the staging table's LoanCodeId column.

5. If LoanCodeMaster  rows are considerably higher, use a join to update the staging table.

6. Once you have all the staging rows ready with correct Id's, move them to the actual table(Loan)

Some points to consider

- Add a flag in the staging table to check if the rows are processed or not.

- Create a simple UI for staging table to check the status and making sure the data matches with the CSV

- Use timers to perform all the above activities.

- Use timer's best practice to avoid timeouts.

2022-08-26 11-04-22
Nitin Chavan

Thanks @Zafar Sultan !

Let us try this.

1. Break the CSV into multiple CSV files - We are getting single CSV file(Approx. 20M records) from client. Could you please explain more on this point.

Are you suggesting any forge component for Bulk Insert rows?

2022-11-15 12-05-50
Zafar Sultan

As the image you posted in your question does not have any identifier for each row in csv file, there are two approaches:

1. Check if you can receive multiple smaller files instead of 1 big file 

2. Break this file from your process into multiple files and process each file individually. 

The reason for working with batches is better control over processed rows. In case your process fails or there is an exception, you know where to re-start from.

For bulk insert you can use this component.

https://www.outsystems.com/forge/component-overview/1117/bulkinsert

2022-08-26 11-04-22
Nitin Chavan

 Thanks for quick response!

The image I have provided in question is just for explanation purpose. In actual file we have unique identifier present for each row. 

2023-02-09 12-36-42
Damian Fonville

Hi Nitin,

I suggest to write all record to an external datababase, Then write a Timer in OutSystems that wil read for example 10K records each batch, then when a record is processed remove it from the external database, so it won't be processed next batch.




2022-08-26 11-04-22
Nitin Chavan

Thanks @Damian Fonville !

We can use OutSystems database as well, why to write in external database?

Then client need to procure another database.

2023-10-21 19-42-11
Tousif Khan
Champion

Hello @Nitin Chavan 

The best way I can suggest you is to create a timer, and you need to create an action flow that will process some records at a time asynchronously around 1000 and after iteration of those records you can commit a transaction so that the records that are already processed, won't be processed again.

Example: But for that, you need to create logic or update flags in such a way that the start index of your loop will start from the index where you have new records to process. Also at the start of the action flow add a time to a variable that will insure your action will not be time out and if that time is passed you can wake the timer in a flow.

I am also attaching a link to the article that has explained this in detail.

https://www.outsystems.com/forums/discussion/60880/best-practices-timer-example-of-shown-code/

https://www.osquay.com/knowledge-center/how-to-make-sure-a-heavy-timer-doesnt-misbehave-in-the-outsystems-platform

https://www.slideshare.net/OutSystems/using-processes-and-timers-for-longrunning-asynchronous-tasks

https://www.outsystems.com/training/lesson/1749/odc-2018-heavy-timers

https://www.outsystems.com/forums/discussion/81684/best-solution-of-timer/

I hope this will help you get this sorted.

Best Regards,
Tousif Khan

2022-08-26 11-04-22
Nitin Chavan

Thanks @Tousif Khan !

Does following approach looks great?

  • Create IsSuccess column in staging table to track record is processed successfully.
  • Create entity to store last processed record Index. Use autogenerated ID column of staging table to track start index of loop. 
  • After timer finishes chunk of 1000 records then update the last ID in Entity and reawake the timer



2023-10-21 19-42-11
Tousif Khan
Champion

Hello @Nitin Chavan ,

If any of the solution helped you, Mark those as a solution, it would be great for other’s to find the solution for the same.

2022-08-26 11-04-22
Nitin Chavan

Hi @Tousif Khan ,

We have tried this using CSVUtil and BulkInsert Forge component. 

We have tried with CSV file with below size-

Records - 1.1 M 

Columns - 65

Size -  450 MB 

We are doing this operation through timer. We are doing data import in chunk of 10000. Toward the end of the CSV file we can see extreme slowness in CSVUtil.

Please refer below question-  

https://www.outsystems.com/forums/discussion/89807/csvutil-csvutil-slowness-for-large-file/

Regards,

Nitin

Community GuidelinesBe kind and respectful, give credit to the original source of content, and search for duplicates before posting.