Million Files

pbazin Registered Posts: 4 ✭✭✭✭

I would like to set-up a process to handle 13 million (and more coming) JSON files.

I tried to create a 'Files in Folder', but it's not a good idea.
Any Idea ?
- using HDFS might help ?
- Partinionning might hep ?




  • Clément_Stenac
    Clément_Stenac Dataiker, Dataiku DSS Core Designer Posts: 753 Dataiker

    Hi Pascal,

    Well, our reply will not be very different from the one we sent just a few hours ago

    DSS was not really designed with this kind of number of files in mind, and your experience will range from impossible to difficult.

    We'd advise you to try using code with the native APIs of your underlying storage engine. In any case, we would advise you to expect a very painful experience. No data management experience is going to be fast or pleasant with these kinds of volumes. If you have any possibility to, we'd strongly urge you to reconsider the design of the system producing this.

Setup Info
      Help me…