Upload large (8 GB) files
Hi all,
Has anyone had any experience manually uploading large files to a DSS Folder? We are attempting to upload an 8 GB file and part way through the status bar just disappears and it appears that the upload has stopped without completing.
Is this just too large? Or is there something we should be doing differently?
Thanks!
Marlan
Operating system used: Linux Red Hat
Best Answer
-
JordanB Dataiker, Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 296 Dataiker
Hi @Marlan
,The file would need to be accessible from the machine that is hosting DSS in order to point your managed folder to it. Can you please try compressing the file and uploading it once more to your managed folder?
You can also transfer the file to the machine that is hosting DSS using scp and then point the managed folder to it as you had done earlier -> scp file.csv remote_username@ip:/remote/directory
Another option would be to load the file to a cloud storage connection or SQL database and either create a dataset from there or point a folder to it.
Let us know if you have any questions.
Thanks!
Jordan
Answers
-
JordanB Dataiker, Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 296 Dataiker
Hi @Marlan
,I would recommend pointing the managed folder to the file on your local machine (folder > settings > browse path). Then, you will have the option to create a dataset from that file. You can also achieve this via Python APIs in a notebook or a python recipe.
Please let me know if you have any questions.
Thanks!
Jordan
-
Marlan Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Dataiku Frontrunner Awards 2021 Participant, Neuron 2023 Posts: 320 Neuron
Hi @JordanB
,Thanks for the suggestion. I don't know how to do what you suggest though. When I click the BROWSE... button on the Folder Settings page, I can navigate to root and see folders there but can't figure how I would specify a path to the file on my local machine.
By local machine, I assume you mean the Windows PC I am using to access DSS via a browser? That's where the large file is in any case.
Thanks,
Marlan
-
Marlan Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Dataiku Frontrunner Awards 2021 Participant, Neuron 2023 Posts: 320 Neuron
Hi @JordanB
,Compressing the file does help. I was able to upload a 4 GB file that was compressed down to 800 MB. And it appears that the zip file can be read directly as a dataset which is cool. So this is a helpful option that increases file upload capacity to probably around 8 GB and works without any additional tools or set up (well, besides a compression tool).
We don't have direct access to the Linux server so the scp option isn't one we could do ourselves. We could work with the Linux admins but that would get a bit involved for a one-time upload of a file.
Uploading to cloud storage is a good option. We are working on getting that set up and that's likely what we would do in this situation once available.
The goal for uploading the file was to loading it into a SQL database.
Thanks for your suggestions.
Marlan