FASTER way to upload excel file

FASTER way to upload excel file

  

Hi



I have an excel file xlsx with 70 thousand records with about 30 columns. I need to feed it by bootstrapping as fast as possible in less one minute. I am using the for each approach... this takes around 17 minutes to (up)load all the data to the database. How to improve this action so the excel file can be uploaded in less one minute? How to achieve a bulk insert of 100 thousand records/lines in excel to the database?... 

Thanks. 

ps action attached


Hello Jorge,

First thing: Why do you need to do this in under a minute? 

Bootstrap is to fill data in database, in development, mainly, so that you have data to work with. It will be triggered on publishing. 

There is absolutely NO reason to be that fast and 17 minutes to this amount of data is perfectly reasonable. 

Even if you need to do this in production, nothing will guarantee that the timer will be triggered immediately after the publishing.

So... 

If for some reason you reaaly need to be that fast, you will have to be sure you have enough bandwith to upload the file very fast, first thing.

Second, probably you will want to work with a CSV file instead of Excel. I heard that converting CSV to list is much faster than an excel. And than you will have to iterate through the list, what will take the time it needs, not much to do here.

Or... You can ask someone to create a script and insert data direct into the database (at your own risk).

So, I would say, just forget it. 17 minutes for a bootstrap that will happen only once (if you have to do it many times you have bigger problems than this), is absolutely ok, imho.

Cheers.

Hi Jorge,

For bulk insert you can use this component https://www.outsystems.com/forge/component/1117/bulkinsert/

Regards,

Marcelo

Marcelo Ferreira wrote:

Hi Jorge,

For bulk insert you can use this component https://www.outsystems.com/forge/component/1117/bulkinsert/

Regards,

Marcelo

Hello Marcelo,

While I stand for what I said, that I don't think the time it takes to upload such amount of information in a bootstrap is really a problem (I would like to understand if it is), I am curious to see how much of improvement the component can give in terms of performance. I imagine a lot, as while I can't open the code right know, I imagine that the list is iterated and the data is inserted at least in big chunks, instead of one record at a time.

Do you know how much faster it goes?

@Jorge

Could you test it and tells us how much the velocity will be improved.
You will need to know the real table name.

Cheers.

i am working with production data. (in spite of working in a lab environment, not ask why, maybe in the time it will pass to the fabric...).

"Why do you need to do this in under a minute? "

Requirement by the customer. :) 


It was a requirement by the customer to load the excel file faster than 17 minutes. Even in the final production, this website will have (?) an option to delete and (re)load again some tables . But I think this option should not be available and only enabling to update any new changes and insert only new ones records with a job (during the night) , for example. But it seems that this could be jeopardized with no warning to access to the source (some server in down or permissions to access it denied for svereal reasons). That's why the customer want to upload at times (when needed) some excel files. Some are small, but this one can be very big. 

I heard that in another applications in the customer, it was asked this requirement! (they were implemented with other technologies).


I'll try with the csv file first. Then, if not enough I'll try use the SQL loader. 

I assume it goes via a timer?


in any case, a bulk insert into a staging-table, without any checks...

(it should be possible to achieve 40K records per second)

then pass it through the real table with checks.

perhaps you can go for the 

INSERT INTO [tblTemp] ([Column1], [Column2], [Column3], [Column4])

SELECT A.[Column1], A.[Column2], A.[Column3], A.[Column4]
FROM OPENROWSET 
('Microsoft.Jet.OLEDB.4.0', 'Excel 8.0;Database=D:\Excel.xls;HDR=YES', 'select * from [Sheet1$]') AS A;


btw, uploading a file takes time as well, is that included in your 1-minute business-requirement?


furthermore, if you als need to delete records before inserting record, i would suggest to get more rights so you can truncate the table instead of doing a "delete from"


thanks J. I 'll try it. there is no need to do any check. the original table in excel has high data quality. :)


The CSV file had corrupted data. Now I do have timeout trying to delete an entity (in SSMS). Never got this issue.

Hi Jorge,

That timeout happens because you have a DB process that is locking that table.

Regards,

Marcelo

i have used the truncate table and it solved the issue. 


Before to load the new data, I have used this script in the SQL block (in OS):

SyntaxEditor Code Snippet

DELETE FROM {entity_X}
DBCC CHECKIDENT ('entity_X',RESEED,0)

J. I stumbled upon with this error message. I supposed I need to contact the DBA to enable permission to write to the database with the command openrowset.  No there is no need to use a timer. Someone will choose the xls file and this sql should extract the info needed. I have set the uploaded button.