Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Added on October 1, 2020 1:04PM
Likes: 0
Replies: 3
I have a visual recipe and produce an output dataset, let's call it dataset_1. It shows in the flow as:
input_dataset -> recipe-> dataset_1
Then I change the output of the recipe, let's say I now save it locally instead of SQL and I call it dataset2. Now the flow shows as:
input_dataset -> recipe-> dataset_2
and dataset_1 is now shown as an "orphan" without any lineage, i.e. what was the input dataset and what was the recipe. I can find the input dataset in the Details of dataset_1, but even that is not very convenient. Is there a way to keep the lineage of an output dataset in the flow when changing the output of a recipe?
Thanks
David
Hi @davidmakovoz
. I can think of two ways on doing this:
a) If you are just changing the dataset to store it locally instead of SQL, then you could do use a sync recipe: input_dataset -> recipe-> dataset_1 -> sync -> dataset_2 . But this only works if all you want to do is to have the output stored in two different locations
b) if you are doing some changes in the recipe, so the dataset_2 is actually different from dataset_1, but you would still like to keep the lineage (maybe to compare dataset_1 and dataset_2 in the future?), then I would have two recipes connected to input_dataset, like this:
That's how I would solve this problem. But there might be other options or tools within DSS, that I'm not aware of, that could solve it in a more elegant way.
Hope this helps!
Thank you for the suggestions. To address your points:
1. Actually, it was the opposite, dataset_1 was saved locally and dataset_2 was written in the database.
2. Yes, I agree, normally, one would fork off of a dataset. In those rare cases when it's not possible, I think one option is simply to copy a recipe and just leave a note or something that these two recipes are the same.
No problem. Just to clarify, for point 1, it doesn't matter which dataset is saved locally or which in SQL, NoSQL, HDFS, etc. The order doesn't affect the result.