Forge Component
(26)
Published on 28 Sep by Wei Zhu
26 votes
Published on 28 Sep by Wei Zhu

Hi Wei - can you update to support Long Integer type for StartIndex?

I've got CSV files that are longer than 65k rows that I need to parse... 

Thanks!


Hi Taka-San


Currently StartIndex is Integer and its max value is 2,147,483,647.

So i think it should be fine with 65k rows. 

But if you had any problem with it, please let me know. 


Regards

Wei

Hi Wei-san, 

Sorry, I meant longer than the max signed 32-bit integer - so more than 2,147,483,647.

Any chance you can change the StartIndex to be Long Integer Type?


Also - can you also return cntNum (the actual number of rows that it received) in LoadCSV2RecordList?

Otherwise - if I looped to get a large number of rows multiple times, how can I know whether I finished reading all of it?

For example, if the file has 1000 rows, and I tell it to read 300 rows at a time, I want cntNum to tell me:

Loop 1: StartIndex: 0, MaxNum: 300, cntNum: 300 

Loop 2: StartIndex: 300, MaxNum: 300,cntNum: 300 

Loop 3: StartIndex: 600, MaxNum: 300,cntNum: 300 

Loop 4: StartIndex: 900, MaxNum: 300,cntNum: 100 

Loop 4 means I read the whole file because MaxNum > cndNum 

Otherwise, when can I know I finished reading the whole file when I loop?

(This is important if the CSV file has more than 2^32 rows and I have to use a Timer to read a little at a time for batch processing)


Solution

I customized the CSVUtil and put a copy in my Forge component:

https://www.outsystems.com/forge/Component_Overview.aspx?ProjectId=6111

Modified CSVUtil 1.10.3: Updated LoadCSV2RecordList Action:

  • StartIndex (int-->long)
  • MaxNum (int-->long)
  • Added output parameter cntNum (long) for number of Rows actually read


Solution

Hi Taka-San


> When can I know I finished reading the whole file when I loop?

  You can check the length of the list.

  If it is zero, then it mean there is no more data.


>  This is important if the CSV file has more than 2^32 rows and I have to use a Timer to read a little at a time for batch processing

   If you have file has more than 2^31, this mean your file will 20GB+. (Assume at least 10 Bytes per row)  

  At that case, you'd better split that file to small files at first. Because

  - Every time you call LoadCSV2RecordList, it will parse from first row and to StartIndex.

    Usually this overhead can be ignored, but if your file are huge, there are serious performance issues.

 - If you can split to small files and have enough server resource, you can use BPT to do Parallel Processing.

 

Best Regards

Wei