Hi,
We have a requirement to import more than 20 million records from CSV file to OutSystems database.
Volume of data is too huge that table contains more than 100 columns and we have to perform insert operation for more than 20 million records.
Could you please suggest me better approach to handle this scenario?
Best Regards,
Nitin
@Nitin Chavan you can try with following steps.
1. Break the CSV into multiple CSV files.
2. Create a staging table with columns LoanCodeId(Type LoanCodeMaster Identifier, null), LoanCode, LoanAmount
3. For each file, bulk insert records into the staging table(initially, LoanCodeId as null)
4. If there are not many LoanCodeMaster rows, you can just iterate through these rows and update the staging table's LoanCodeId column.
5. If LoanCodeMaster rows are considerably higher, use a join to update the staging table.
6. Once you have all the staging rows ready with correct Id's, move them to the actual table(Loan)
Some points to consider
- Add a flag in the staging table to check if the rows are processed or not.
- Create a simple UI for staging table to check the status and making sure the data matches with the CSV
- Use timers to perform all the above activities.
- Use timer's best practice to avoid timeouts.
Thanks @Zafar Sultan !
Let us try this.
1. Break the CSV into multiple CSV files - We are getting single CSV file(Approx. 20M records) from client. Could you please explain more on this point.
Are you suggesting any forge component for Bulk Insert rows?
As the image you posted in your question does not have any identifier for each row in csv file, there are two approaches:
1. Check if you can receive multiple smaller files instead of 1 big file
2. Break this file from your process into multiple files and process each file individually.
The reason for working with batches is better control over processed rows. In case your process fails or there is an exception, you know where to re-start from.
For bulk insert you can use this component.
https://www.outsystems.com/forge/component-overview/1117/bulkinsert
Thanks for quick response!
The image I have provided in question is just for explanation purpose. In actual file we have unique identifier present for each row.
Do you need to customize data before import or just read the data and directly importing data?
And anyway, you cannot upload too huge volume of data like 20M record of CSV. You need to break down that.
If you don't need to modify (customize) your data before import, you can just use the timer with some trick to bypass 20 default timeouts.
If you need to modify data before import, maybe you can use the Light Process to queue your importing.
@Vo Ton Phuc Thanks for quick response!
We have CSV with 20M rows. We have to covert that data in relational form.
As we have Parent tables already available in DB.
We need to check foreign key value and then we need to insert the row.
Ex. Below we take Loan.csv file as input and insert data in Loan table. We are doing transformation.
Here we need to check the ID of Loan_CD (CSV column) in master table LoanCode.
Thanks!
Let me confirm a little bit more about this case.
1. Do you need to process this task by OutSystems2. Which environment that OutSystems is using? OSCloud or Self-Managed?
By the way, you need to access the master and get the ID for Loan Code and map it to Loan_CD in CSV Record before inserting that record to Database.?
1. Yes we need to perform transformation using OutSystems
2. We are using OSCloud
Loan table I have explained is only example and we have similar scenario but approx
50 columns in table.
You can check below post if that helps:
Thank @Puja Rani !
Hi Nitin,
I suggest to write all record to an external datababase, Then write a Timer in OutSystems that wil read for example 10K records each batch, then when a record is processed remove it from the external database, so it won't be processed next batch.
Thanks @Damian Fonville !
We can use OutSystems database as well, why to write in external database?
Then client need to procure another database.
Hello @Nitin Chavan
The best way I can suggest you is to create a timer, and you need to create an action flow that will process some records at a time asynchronously around 1000 and after iteration of those records you can commit a transaction so that the records that are already processed, won't be processed again.
Example: But for that, you need to create logic or update flags in such a way that the start index of your loop will start from the index where you have new records to process. Also at the start of the action flow add a time to a variable that will insure your action will not be time out and if that time is passed you can wake the timer in a flow.
I am also attaching a link to the article that has explained this in detail.https://www.outsystems.com/forums/discussion/60880/best-practices-timer-example-of-shown-code/
https://www.osquay.com/knowledge-center/how-to-make-sure-a-heavy-timer-doesnt-misbehave-in-the-outsystems-platform
https://www.slideshare.net/OutSystems/using-processes-and-timers-for-longrunning-asynchronous-tasks
https://www.outsystems.com/training/lesson/1749/odc-2018-heavy-timers
https://www.outsystems.com/forums/discussion/81684/best-solution-of-timer/
I hope this will help you get this sorted.
Best Regards,Tousif Khan
Thanks @Tousif Khan !
Does following approach looks great?
Hello @Nitin Chavan ,
If any of the solution helped you, Mark those as a solution, it would be great for other’s to find the solution for the same.
Hi @Tousif Khan ,
We have tried this using CSVUtil and BulkInsert Forge component.
We have tried with CSV file with below size-
Records - 1.1 M
Columns - 65
Size - 450 MB
We are doing this operation through timer. We are doing data import in chunk of 10000. Toward the end of the CSV file we can see extreme slowness in CSVUtil.
Please refer below question-
https://www.outsystems.com/forums/discussion/89807/csvutil-csvutil-slowness-for-large-file/
Regards,