data historization

Level 2
data historization

Hello to all Dataiku users,

I am writing to you because I have a problem. I hope you can help me. (a diagram of the problem is attached)

I'm looking for a way to historicize some datasets in green CSV or excel (HDFS). Indeed I should be able to keep a history of datasets in HDFS in a subfolder for example at each RUN

I explain, at each RUN of the flow zone, I would like the intermediate dataset and the final dataset to be stored in a subfolder in hdfs.
The objective is that I can compare the different versions at each run (because in my recipes, I can modify things)

I don't know if I am very clear. Thanks for your help

0 Kudos
2 Replies


Take a look at the following Dataiku Features see if they can be of help to you.

  • Partitioned Datasets
  • Append Recipe Options on Output datasets.
  • Export to Folder

This seems to go over some of this 

I know that I've been able to use these two features to acheive something like I think you want to do.

There is another discussion about doing something like this through python.


0 Kudos
Level 2

Dear tgb417,

Thank you very much, I'm looking into it