Survey banner
The Dataiku Community is moving to a new home! Some short term disruption starting next week: LEARN MORE

data historization

Kevin_dataiku8
Level 2
data historization

Hello to all Dataiku users,

I am writing to you because I have a problem. I hope you can help me. (a diagram of the problem is attached)

I'm looking for a way to historicize some datasets in green CSV or excel (HDFS). Indeed I should be able to keep a history of datasets in HDFS in a subfolder for example at each RUN

I explain, at each RUN of the flow zone, I would like the intermediate dataset and the final dataset to be stored in a subfolder in hdfs.
The objective is that I can compare the different versions at each run (because in my recipes, I can modify things)

I don't know if I am very clear. Thanks for your help

0 Kudos
2 Replies
tgb417

@Kevin_dataiku8 

Take a look at the following Dataiku Features see if they can be of help to you.

  • Partitioned Datasets
  • Append Recipe Options on Output datasets.
  • Export to Folder

This seems to go over some of this https://community.dataiku.com/t5/Using-Dataiku/how-can-select-the-append-mode-in-a-dataset/td-p/3367 

I know that I've been able to use these two features to acheive something like I think you want to do.

There is another discussion about doing something like this through python.

https://community.dataiku.com/t5/Using-Dataiku/How-to-add-data-to-a-existing-dataset-with-python/m-p...

 

--Tom
0 Kudos
Kevin_dataiku8
Level 2
Author

Dear tgb417,

Thank you very much, I'm looking into it