Check out the first Dataiku 8 Deep Dive focusing on Productivity on October 29th Read More

Changing output of a recipe

Level 3
Changing output of a recipe

I have a visual recipe and produce an output dataset, let's call it dataset_1. It shows in the flow as:

input_dataset -> recipe-> dataset_1

Then I change the output of the recipe, let's say I now save it locally instead of SQL and I call it dataset2. Now the flow shows as:

input_dataset -> recipe-> dataset_2

and dataset_1 is now shown as an "orphan" without any lineage, i.e. what was the input dataset and what was the recipe. I can find the input dataset in the Details of dataset_1, but even that is not very convenient. Is there a way to keep the lineage of an output dataset in the flow when changing the output of a recipe?

Thanks

David

0 Kudos
3 Replies

Hi @davidmakovoz . I can think of two ways on doing this:

a) If you are just changing the dataset to store it locally instead of SQL, then you could do use a sync recipe: input_dataset -> recipe-> dataset_1 -> sync -> dataset_2 . But this only works if all you want to do is to have the output stored in two different locations

b) if you are doing some changes in the recipe, so the dataset_2 is actually different from dataset_1, but you would still like to keep the lineage (maybe to compare dataset_1 and dataset_2 in the future?), then I would have two recipes connected to input_dataset, like this:

Selection_346.png

That's how I would solve this problem. But there might be other options or tools within DSS, that I'm not aware of, that could solve it in a more elegant way.

Hope this helps!

0 Kudos
Level 3
Author

Thank you for the suggestions. To address your points:

1. Actually, it was the opposite, dataset_1 was saved locally and dataset_2 was written in the database.

2. Yes, I agree, normally, one would fork off of a dataset. In those rare cases when it's not possible, I think one option is simply to copy a recipe and just leave a note or something that these two recipes are the same. 

No problem. Just to clarify, for point 1, it doesn't matter which dataset is saved locally or which in SQL, NoSQL, HDFS, etc. The order doesn't affect the result.

0 Kudos