Changing output of a recipe

davidmakovoz
davidmakovoz Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2022, Neuron 2023 Posts: 67 Neuron

I have a visual recipe and produce an output dataset, let's call it dataset_1. It shows in the flow as:

input_dataset -> recipe-> dataset_1

Then I change the output of the recipe, let's say I now save it locally instead of SQL and I call it dataset2. Now the flow shows as:

input_dataset -> recipe-> dataset_2

and dataset_1 is now shown as an "orphan" without any lineage, i.e. what was the input dataset and what was the recipe. I can find the input dataset in the Details of dataset_1, but even that is not very convenient. Is there a way to keep the lineage of an output dataset in the flow when changing the output of a recipe?

Thanks

David

Answers

  • Ignacio_Toledo
    Ignacio_Toledo Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 415 Neuron

    Hi @davidmakovoz
    . I can think of two ways on doing this:

    a) If you are just changing the dataset to store it locally instead of SQL, then you could do use a sync recipe: input_dataset -> recipe-> dataset_1 -> sync -> dataset_2 . But this only works if all you want to do is to have the output stored in two different locations

    b) if you are doing some changes in the recipe, so the dataset_2 is actually different from dataset_1, but you would still like to keep the lineage (maybe to compare dataset_1 and dataset_2 in the future?), then I would have two recipes connected to input_dataset, like this:

    Selection_346.png

    That's how I would solve this problem. But there might be other options or tools within DSS, that I'm not aware of, that could solve it in a more elegant way.

    Hope this helps!

  • davidmakovoz
    davidmakovoz Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2022, Neuron 2023 Posts: 67 Neuron

    Thank you for the suggestions. To address your points:

    1. Actually, it was the opposite, dataset_1 was saved locally and dataset_2 was written in the database.

    2. Yes, I agree, normally, one would fork off of a dataset. In those rare cases when it's not possible, I think one option is simply to copy a recipe and just leave a note or something that these two recipes are the same.

  • Ignacio_Toledo
    Ignacio_Toledo Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 415 Neuron

    No problem. Just to clarify, for point 1, it doesn't matter which dataset is saved locally or which in SQL, NoSQL, HDFS, etc. The order doesn't affect the result.

Setup Info
    Tags
      Help me…