[CSVUtil] BulkInsert large CSV file

[CSVUtil] BulkInsert large CSV file

  
Forge Component
(13)
Published on 24 Sep by Wei Zhu
13 votes
Published on 24 Sep by Wei Zhu

I'm importing a large file containing millions of records; using this component to read the binary. This works fine but I have the following problem.

While LoadCSV2Recordlist returns X-records (chunk size); Read the chunk and write it (with AsychronousLoggin).

The first Chunks are read within 300-400 ms; but it slows down every next read... now reading 1 chunck costs 73571 ms.

Is there another way to use this on large csv binaries?




Seems to be a memory leak issue, probably the list isn't being cleared and each time he has more records, but start index update allow to identify where start to read next chunck. That can explain the slow down.

Try create a timer to execute that function, without schedule. When executing process wake timer, load for example 2000 records (site property) each execution, save last start index on a entity. If there is more records to process wake timer and start processing from last start index, so on.

That's a good approach to process large amounts of data, because if try to do it at once probably will end on a timeout

I already tried that; instead of starting with line 1 and as 1st chunk; started at line 3.000.000; and this gives the same problem directly. the first chunck of 25.000 records took >70.000 ms.

hmm ... why the source code is not complete? I think the performance of the CSVUtil dll can be improved a lot by actually skipping the Start Index lines.

see http://stackoverflow.com/a/25875537

Hello


Because of using  start line, we need parse the csv from first line and skip each time.

So the performance will become slow when parse later chunk.


To improve the performance  the best way is to seek to start line directly.

So we need not only line position but also byte position

I think i can finish it in two or three weeks.


Regards

Wei







Wei Zhu wrote:

Hello


Because of using  start line, we need parse the csv from first line and skip each time.

So the performance will become slow when parse later chunk.


To improve the performance  the best way is to seek to start line directly.

So we need not only line position but also byte position

I think i can finish it in two or three weeks.


Regards

Wei