[CSVUtil] CSVUtil slowness for large file

Back to Forums

Nitin Chavan

Solved

Question

BPT

Timers

Forge

Import/Export Data

Service

Application Type

Service

Hi Team,

We have a requirement to import data from CSV file which contains 1 million+ records .

We are doing this operation through timer. We are doing data import in chunk of 10000.

We are using CSVUtil forge component to get records. Here MaxNum= 10000, StartIndex we are keeping in Entity.
We are using Bulk Insert forge component to insert chunk of 10000 in DB.
In below image we can see, for initial 50k records LoadCSV2RecordList action is fast
Toward the end of the file we can see extreme slowness for LoadCSV2RecordList action. In below Service Center log we can see to get CSV row 4 lakh to 4.1 lakh the action took more than 20 secs. We have to process 1 M records. How to overcome the slowness of CSVUtil?

Best Regards,

Nitin

13 Jul 2023

Wei Zhu

Solution

@Nitin Chavan

StartIndex / MaxNum are used to load large CSV file that could not load all into memory.
For example, if you had a CSV that is 5GB, you can load around 100MB to avoid Out Of Memory.
On the other hand, because we need parse CSV from first line and skip each time, so last 100MB will took more time than first 100MB.

For you case, you had 1 million+ records.
Though I'm not sure your data size, But it seems MaxNum = 10000 is too small.
Why don't you try MaxNum = 200K or even load whole CSVs if you had enough memory.
This will decrease overhead and efficiency.

If you need BulkInsert with ChunkSize with 10000,
Just simply loop on CSVList and fill data to temporary buffer,
when buffer is full (list is reach 10000) or last line of CSV, flush it (BulkInsert and ListClear).

Regards
Wei

See solution in context

17 Jul 2023

1 reply

18 Jul 2023

Show thread

Hide thread

Nitin Chavan

Thank you @Wei Zhu !

Your solution worked for me :)

I have loaded whole CSV. Below are the statistics-

CSV file size = 450 MB
Rows in CSV file = 1 Million

According to ServiceCenter log I can see -

CSVUtilExtension.LoadCSV2RecordList took 106308 ms.

18 Jul 2023

Matthias Preuter

MVP

Maybe you could split the file first by using the FileSystem Low-level File actions

FS_Open

Opens a file for low-level reading.

FS_ReadBytes

Reads the specified number of bytes from a file opened with FS_Open.

FS_Seek

Moves the file read pointer to a specific position in the file opened with FS_Open.

FS_Close

Closes a file previously opened with FS_Open.

13 Jul 2023

2 replies

Last reply 14 Jul 2023

Show thread

Hide thread

Nitin Chavan

Thanks @Matthias Preuter for quick response!

How to calculate specific Position to read file?

Will it be same as StartIndex of LoadCSV2RecordList action?

14 Jul 2023

Nitin Chavan

Hi @Matthias Preuter

Is there any example to read CSV file in chunk?

I am not getting value to be set in -

FS_ReadBytes - Attribute NumberOfBytes

FS_Seek - Attribute Position

14 Jul 2023

Wei Zhu

Solution

Regards
Wei

17 Jul 2023

1 reply

18 Jul 2023

Show thread

Hide thread

Nitin Chavan

Thank you @Wei Zhu !

Your solution worked for me :)

I have loaded whole CSV. Below are the statistics-

CSV file size = 450 MB
Rows in CSV file = 1 Million

According to ServiceCenter log I can see -

CSVUtilExtension.LoadCSV2RecordList took 106308 ms.

18 Jul 2023

Community GuidelinesBe kind and respectful, give credit to the original source of content, and search for duplicates before posting.

See the full guidelines