[ReactFilePondUpload] Uploading numerous images causes filepond upload to be droppping performance

Forge Component
(7)
Published on 11 Aug by Stuart Harris
7 votes
Published on 11 Aug by Stuart Harris

Hi, I have tested uploading 1500 images using this component and observed that it is dropping performance compared when I was uploading only 300 images. Anyone also experiencing this kind of scenario? Also I could present benchmarking times with 300 images vs 1500 images


When Uploading 300/300 images, it only took 7 minutes to finish the whole process from reading all the files to uploading it to the server

When Uploading 1500 images, 300/1500 took 30 minutes to finish the whole process with same above.

HI Jaybriel,

Thank you for performing this test and posting the results.

The network tab in your browsers dev tools should give you a good idea of what is occurring.

The uploads in this component are hardcoded to do a maximum of 5 files asynchronously, so as not to overload the server when multiple users are uploading.  This is why the time to upload 1500 files would increase linearly compared to 300 files.  You can see in this screenshot from Chrome's Network tab how the start of the upload for the 6th file is delayed until the first 5 have completed.

It is also worth evaluating the performance of uploads against other concurrent web requests on the application, to ensure a user uploading 1500 files does not result in a sort of denial of service attack.

I have not considered this component as an ingestion tool, rather a better user experience for users uploading a few files.  However, that does not mean it could not be a valid scenario.

If you are looking for a change to the component:

Could you confirm if your project requires an ingestion tool that would be required to upload 1500 files at one time in the production application, or whether you are just benchmarking different upload components?

Also, what is your expectation of the performance for the upload, also for the impact on the rest of the application?

I hope this helps clarify things!

Kind regards,

Stuart

Good day Stuart,

Sorry for the late reply. Our project really needs to upload 1500 files at one time and it will be deployed in production.

Our expectation on the performance of the upload would be able to upload 1500 files at less than 30 mins. 

I hope you could help us in our requirement and it will be highly appreciated if this is resolved or would somehow bring us an idea on how to resolve this matter. Thank you very much!!!

Hi Jaybriel,

I could make the maximum parallel uploads configurable instead of hardcoded to 5.  That way you could change it to a value that would suit your scenario.  Would that help?

You may be across this, but I will mention it in case it helps.  A common pattern for handling imports with many uploaded files is to separate the upload and the processing of the file into separate steps.

This way you can upload files and prepare them over a period, then when you are ready, click a button to begin processing them.  The benefits it has are:

* The time required to upload is reduced

* If an upload fails, there is no need to rollback any processing as it has not started yet

* If there is an error during processing, the uploads do not need to be done again

* If the error is related to one of the uploaded files, only that file needs to be reuploaded, not the whole lot.

Additionally, separating the processing of files into batches or if possible, one by one using a process can help protect processing of a particular file from being affected by a failure in another.

Of course, I do not know your specific requirements, so you will have to evaluate these patterns against your specific needs.

I hope this helps!

Kind regards,

Stuart

Hi guys,


Good discussion here. Thank you very much for your inputs!


Stuart, the file processing is already happening asynchronously after the upload, so we don't believe this is the cause of the slow behavior - rather, the serialization happening on the browser seems to be the main cause. After stretching the max parallel uploads to 10 some progress was achieved (2h->1h+), however we are still far from the desired time of such an upload scenario. After recently looking into this matter, and still considering the reusage of the ReactFilePondUpload component, some of the most interesting feedback obtained so far suggests archiving the images before their upload (e.g. through a JS library such as https://stuk.github.io/jszip/). In light of this, does any one know if such a plugin is already built for the wrapped filepond library? [edit] filepond's plugins page does not seem to show any related to this... maybe we do need to go around this by reviewing the requirements...


Thanks!

Hi Pedro,

Yes, I understand the primary issue is the length of time it takes to upload files.  The scenario's of separating processing is to avoid having to upload files multiple times.  But it sounds like that is not required for your scenario, so fair enough.

Most images are already compressed, so you may not achieve much improvement with zip. But again, I do not know your scenario clearly enough, it may help.

I am afraid there is nothing further I can do if the limit is actual bandwidth and size and number of files.  The control aids uploading files, it does not increase the speed with which files are uploaded.  Unfortunately, the bytes need to be transferred, there is no way around that.

Wishing you all the best and hope you find a workable solution.

Kind regards,

Stuart

Hi everyone,


We made some tests on increments of 300 files on reactfilepondupload. By increasing the maxparallel upload to 10 we observed that the uploading time improved drastically. 


Here are the results:

300 files uploading(3mins)

time started 2:53pm

time ended 2:56 pm


600 files uploading(5mins)

time started 3:01pm

time ended 3:06pm


900 files uploading(9 mins)

time started 3:08pm

time ended 3:17pm


1200 files uploading (16 mins)

time started 3:18 pm

time ended 3:34pm


1500 files uploading(30 mins)

time started 3:36pm

time ended 4:06


I hope this tests would help the reactfilepondupload component on improving its stability as some applications really require huge amounts of files to be uploaded. Thanks


Thanks Jaybriel,

I appreciate your benchmarking, that really helps understand what is going on.

In that case, it seems it will be useful to expose max parallel uploads as a configuration parameter.

It will take me a week to release the next version, as I am updating to the latest version of FilePond and having some trouble with uploads in Firefox in the interplay between the library and the plugin.

However, I would stress, this plugin is designed for user experience in file uploads.  Uploading hundreds of files at one time is not a scenario I had considered.

Thanks again and I hope this is a workable solution for you.

Kind regards,

Stuart

Hi Jaybriel,

A new version of the plugin is available now that has a configurable option for max parallel uploads.

The new version is set to "under development". Later this week I will switch it to "stable" (or upload a new version if I find problems).  Any feedback you have would be welcome.  I have done some testing to confirm the new option adjusts the maximum number of parallel uploads.  I will do some more testing this week.


Also I owe Pedro Gonçalves an apology.  Pedro I read your response as if it was from Jaybriel, so I misunderstood what you meant, and also I think I misread yourresponse.  I really appreciate you commenting and helping out with the solution.

Yes Pedro, I believe the serialization on the browser is a factor, in that it has to read the whole file at one time, instead of streaming it to the server.  I could not find a way around this though, I was not able to use a normal http upload facility that I did with the Traditional Web FilePondUpload plugin, because React has no preparations, which I had used to receive a file upload. In React I had to load the file completely on the client side using the HTTP5 File API and send it through a client action.  The conversion to base64 is ok I think because I believe this is how OutSystems binaries upload, so it has to happen anyway, and I don't believe it is happening twice; but I am not 100% sure.

Thank you, great suggestion about compression.  As many image formats are already compressed it may not help, but other document types would be helped.  I imagine 2 configuration options, one to switch on compression, the other to specify the size of the file where compression would start to be used. Compressing file smaller than the network packet size would not be worth it.  This looks like a good candidate to try https://github.com/beatgammit/gzip-js 

Please let me know if the compression solution would genuinely be useful for you and I will put it on the roadmap to look at.

Stuart, by all means, don't apologize. We honestly appreciate all the attention to this component's threads and everyone in the team has been quite amazed with your responsiveness and constructive feedback. Jay's reply above already demonstrated the great progress we achieved thanks to that single hint of increasing the max parallel requests. We have also made additional efforts recently with the customer by discussing the feature's requirements mitigating the NFR aspects - so, we'll now have less number of files allowed to upload, and no compression is required as of now (i.e. not urgent). It's also very true that the images' format (JPEG) is already quite compressed. So thank you for the great support! Awesome community spirit!