Discover all of the brand-new features and improvements to existing capabilities in the Dataiku 11.3 updateLET'S GO

Upload large (8 GB) files

Solved!
Marlan
Upload large (8 GB) files

Hi all,

Has anyone had any experience manually uploading large files to a DSS Folder? We are attempting to upload an 8 GB file and part way through the status bar just disappears and it appears that the upload has stopped without completing.

Is this just too large? Or is there something we should be doing differently?

Thanks!

Marlan


Operating system used: Linux Red Hat

0 Kudos
1 Solution
JordanB
Dataiker

Hi @Marlan,

The file would need to be accessible from the machine that is hosting DSS in order to point your managed folder to it. Can you please try compressing the file and uploading it once more to your managed folder?

You can also transfer the file to the machine that is hosting DSS using scp and then point the managed folder to it as you had done earlier -> scp file.csv remote_username@ip:/remote/directory

Another option would be to load the file to a cloud storage connection or SQL database and either create a dataset from there or point a folder to it. 

Let us know if you have any questions.

Thanks!

Jordan

View solution in original post

0 Kudos
5 Replies
JordanB
Dataiker

Hi @Marlan,

I would recommend pointing the managed folder to the file on your local machine (folder > settings > browse path). Then, you will have the option to create a dataset from that file. You can also achieve this via Python APIs in a notebook or a python recipe. 

Please let me know if you have any questions. 

Thanks!

Jordan

0 Kudos

Hi @JordanB,

Thanks for the suggestion. I don't know how to do what you suggest though. When I click the BROWSE... button on the Folder Settings page, I can navigate to root and see folders there but can't figure how I would specify a path to the file on my local machine.  

By local machine, I assume you mean the Windows PC I am using to access DSS via a browser? That's where the large file is in any case.

Thanks,

Marlan

0 Kudos
JordanB
Dataiker

Hi @Marlan,

The file would need to be accessible from the machine that is hosting DSS in order to point your managed folder to it. Can you please try compressing the file and uploading it once more to your managed folder?

You can also transfer the file to the machine that is hosting DSS using scp and then point the managed folder to it as you had done earlier -> scp file.csv remote_username@ip:/remote/directory

Another option would be to load the file to a cloud storage connection or SQL database and either create a dataset from there or point a folder to it. 

Let us know if you have any questions.

Thanks!

Jordan

0 Kudos

Hi @JordanB,

Compressing the file does help. I was able to upload a 4 GB file that was compressed down to 800 MB. And it appears that the zip file can be read directly as a dataset which is cool. So this is a helpful option that increases file upload capacity to probably around 8 GB and works without any additional tools or set up (well, besides a compression tool). 

We don't have direct access to the Linux server so the scp option isn't one we could do ourselves. We could work with the Linux admins but that would get a bit involved for a one-time upload of a file.

Uploading to cloud storage is a good option. We are working on getting that set up and that's likely what we would do in this situation once available.

The goal for uploading the file was to loading it into a SQL database.

Thanks for your suggestions.

Marlan

JordanB
Dataiker

Hi @Marlan,

I'm glad the compression helped! Once you have Cloud Storage connected, that will likely be the most convenient option for large files and loading them to a SQL database.

Thanks!

Jordan

0 Kudos

Labels

?
Labels (1)
A banner prompting to get Dataiku