data historization
Hello to all Dataiku users,
I am writing to you because I have a problem. I hope you can help me. (a diagram of the problem is attached)
I'm looking for a way to historicize some datasets in green CSV or excel (HDFS). Indeed I should be able to keep a history of datasets in HDFS in a subfolder for example at each RUN
I explain, at each RUN of the flow zone, I would like the intermediate dataset and the final dataset to be stored in a subfolder in hdfs.
The objective is that I can compare the different versions at each run (because in my recipes, I can modify things)
I don't know if I am very clear. Thanks for your help
Answers
-
tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,598 Neuron
Take a look at the following Dataiku Features see if they can be of help to you.
- Partitioned Datasets
- Append Recipe Options on Output datasets.
- Export to Folder
This seems to go over some of this https://community.dataiku.com/t5/Using-Dataiku/how-can-select-the-append-mode-in-a-dataset/td-p/3367
I know that I've been able to use these two features to acheive something like I think you want to do.
There is another discussion about doing something like this through python.
-
Dear tgb417,
Thank you very much, I'm looking into it