Changing output of a recipe

Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2022, Neuron 2023 Posts: 67 Neuron

I have a visual recipe and produce an output dataset, let's call it dataset_1. It shows in the flow as:

input_dataset -> recipe-> dataset_1

Then I change the output of the recipe, let's say I now save it locally instead of SQL and I call it dataset2. Now the flow shows as:

input_dataset -> recipe-> dataset_2

and dataset_1 is now shown as an "orphan" without any lineage, i.e. what was the input dataset and what was the recipe. I can find the input dataset in the Details of dataset_1, but even that is not very convenient. Is there a way to keep the lineage of an output dataset in the flow when changing the output of a recipe?

Thanks

David

Welcome!

It looks like you're new here. Sign in or register to get started.

Answers

  • Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 417 Neuron

    Hi @davidmakovoz
    . I can think of two ways on doing this:

    a) If you are just changing the dataset to store it locally instead of SQL, then you could do use a sync recipe: input_dataset -> recipe-> dataset_1 -> sync -> dataset_2 . But this only works if all you want to do is to have the output stored in two different locations

    b) if you are doing some changes in the recipe, so the dataset_2 is actually different from dataset_1, but you would still like to keep the lineage (maybe to compare dataset_1 and dataset_2 in the future?), then I would have two recipes connected to input_dataset, like this:

    Selection_346.png

    That's how I would solve this problem. But there might be other options or tools within DSS, that I'm not aware of, that could solve it in a more elegant way.

    Hope this helps!

  • Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2022, Neuron 2023 Posts: 67 Neuron

    Thank you for the suggestions. To address your points:

    1. Actually, it was the opposite, dataset_1 was saved locally and dataset_2 was written in the database.

    2. Yes, I agree, normally, one would fork off of a dataset. In those rare cases when it's not possible, I think one option is simply to copy a recipe and just leave a note or something that these two recipes are the same.

  • Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 417 Neuron

    No problem. Just to clarify, for point 1, it doesn't matter which dataset is saved locally or which in SQL, NoSQL, HDFS, etc. The order doesn't affect the result.

Welcome!

It looks like you're new here. Sign in or register to get started.

Welcome!

It looks like you're new here. Sign in or register to get started.