How to load a large HTTP dataset in DSS ?

UserBird
Dataiker
How to load a large HTTP dataset in DSS ?
Hi,

I actually work on Dataiku build on a cluster.

I want to load an HDFS dataset from an HTTP file (Zip contains csv). I use without problem the Network dataset interface on DSS. Problem is when I want to load a large csv file on HTTP (8Go). Dataiku can't detect it and returns me the Preview of the json.

It is possible to load directly a large files in DSS or I need to make a Recipes for this big file ?

Another solution ?
0 Kudos
2 Replies
cperdigou
Dataiker Alumni
I'm sorry I'm not sure I understand exactly your problem.

What do you call a HTTP file?

Is it a file that is currently on a remote server? In that case you can connect to it to create a dataset, then sync it to HDFS.

Is it a file that currently is on your local computer? In that case, I guess because of its size the upload does not work and you should try scp ing the file directly on the server, then creating a filesystem dataset pointing to that file.
0 Kudos
Helka
Level 1
Hi cperdigou,

My csv file is stocked online in a remote server (into a zip file : http://files.data.gouv.fr/sirene/). When I use the Network dataset interface on DSS I have no problem connecting into a smaller csv (100Mo). But with a 8Go file I can't connect to it to create a dataset cause it's too big.

Dataiku is installed on a VM with a cluster Hadoop. Maybe can I use the Hadoop powerfull to load this big datafile into Dataiku ?
0 Kudos