How to load a large HTTP dataset in DSS ?

UserBird
UserBird Dataiker, Alpha Tester Posts: 535 Dataiker
Hi,

I actually work on Dataiku build on a cluster.

I want to load an HDFS dataset from an HTTP file (Zip contains csv). I use without problem the Network dataset interface on DSS. Problem is when I want to load a large csv file on HTTP (8Go). Dataiku can't detect it and returns me the Preview of the json.

It is possible to load directly a large files in DSS or I need to make a Recipes for this big file ?

Another solution ?
Tagged:

Answers

  • cperdigou
    cperdigou Alpha Tester, Dataiker Alumni Posts: 115 ✭✭✭✭✭✭✭
    I'm sorry I'm not sure I understand exactly your problem.

    What do you call a HTTP file?

    Is it a file that is currently on a remote server? In that case you can connect to it to create a dataset, then sync it to HDFS.

    Is it a file that currently is on your local computer? In that case, I guess because of its size the upload does not work and you should try scp ing the file directly on the server, then creating a filesystem dataset pointing to that file.
  • Helka
    Helka Registered Posts: 1 ✭✭✭✭
    Hi cperdigou,

    My csv file is stocked online in a remote server (into a zip file : http://files.data.gouv.fr/sirene/). When I use the Network dataset interface on DSS I have no problem connecting into a smaller csv (100Mo). But with a 8Go file I can't connect to it to create a dataset cause it's too big.

    Dataiku is installed on a VM with a cluster Hadoop. Maybe can I use the Hadoop powerfull to load this big datafile into Dataiku ?
Setup Info
    Tags
      Help me…