Try your hand at analyzing royal sentiment in Dataiku DSS! Learn more

Million Files

Level 1
Million Files

I would like to set-up a process to handle 13 million (and more coming) JSON files.

I tried to create a 'Files in Folder', but it's not a good idea.
Any Idea ?
- using HDFS might help ?
- Partinionning might hep ? 

Regards.

0 Kudos
1 Reply
Dataiker
Dataiker

Hi Pascal,

Well, our reply will not be very different from the one we sent just a few hours ago 🙂

DSS was not really designed with this kind of number of files in mind, and your experience will range from impossible to difficult.

We'd advise you to try using code with the native APIs of your underlying storage engine. In any case, we would advise you to expect a very painful experience. No data management experience is going to be fast or pleasant with these kinds of volumes. If you have any possibility to, we'd strongly urge you to reconsider the design of the system producing this.

0 Kudos
Labels (1)