Community Conundrum 25:Feature Visualization is now live! Read More

How to apply custom Python code to multiple csv files in a folder?

Dataiker
Dataiker
How to apply custom Python code to multiple csv files in a folder?

I want to do the above in the free version of DSS (v4.0.5). I created a Filesystem dataset and pointed it at the folder containing my csv input files. All the csv files have the same schema. However, when I create the dataset it only appears to 'see' one of the csv files. So when I run my flow it only processes one of them. But I want to process all the files in order (e.g., by alphabetical order of the input file names for example), feeding the data from each file into my custom code one file's-worth at a time .



Is there any way I can do this without having to write my custom code so that it opens the folder and processes the files in a loop? (E.g., a bit like at https://answers.dataiku.com/1347/read-csvs-from-a-folder)

0 Kudos
2 Replies
Dataiker
Dataiker
If you use a folder you will need to read files one by one in a loop, if you have a lot of files this is the right solution.

If you have a few files, you can upload them one by one, and use a stack recipe to merge all the created datasets into a single one.
0 Kudos
Dataiker
Dataiker
Author
Thanks @cperdigou. I wasn't aware of the Stack recipe (https://doc.dataiku.com/dss/latest/other_recipes/stack.html). I tried it but I think I'll go for the custom code, reading the files in via a loop, as that is more flexible and makes it easier to tell the original datasets apart. Thanks!
0 Kudos
Labels (3)