How to implement a feedback loop on a dataset ?
Hello,
Each month, I have to compute a dataset that takes the previous month's dataset (M-1) and add some stuff in it.
I wonder how I could to it in Dataiku as for the recipe, I should take the last output dataset (M-1) as the input.
I don't think it is currently possible to produce a feedback-loop in Dataiku: do you confirm ?
How could I achieve my computation with Dataiku ? The "append-only" feature is not a good answer, because before writing anything, I should read the (last month) output dataset to know what will be new in the (current month) output.
Best regards.
Answers
-
Hi tomtom,
The flow interface won't like circular references, which sounds like what you're describing here.
Hence, if you have something like this: Dataset_A -> Recipe -> Dataset_B, one solution to your problem is to define Dataset_C by 'pointing it' to Dataset_B. You can do this in the flow by adding a new Dataset and matching the location (for example SQL table) of Dataset_B. This way you can use as input to your recipe both Dataset_A and Dataset_C (which is in fact the same as Dataset_B).
I hope this is not too confusing! -
Hi @Liev
,I'm currently facing a similar issue. I need to "update" some data based on previous results. Could you please explain me how to do what you suggested?
Thanks!
P.S. I'm quite new to Dataiku
-
Marlan Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Dataiku Frontrunner Awards 2021 Participant, Neuron 2023 Posts: 321 Neuron
Hi @tomtom
,If you don't mind working in SQL, another option besides the one described by @Liev
is to use a SQL Script code recipe (but to be clear not a SQL Query recipe). SQL Script recipes don't need to have an input dataset and so you could easily do what you describe (albeit entirely in SQL code).Marlan
-
Ignacio_Toledo Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 415 Neuron
Because English is not my native language, I wouldn't be able to explain in words @Liev
solution in a better way. So, I did the best thing that I could: a recording showing how I solve a similar problem in exactly the same way that Liev mentioned.The video doesn't have audio, and what I'm doing is:
1) create a dataset connected to a table called "daily_status_table"
2) open a dataset that contains a history of the daily statuses: the idea is to add new information into this dataset ("history_daily_track") by doing some crossmatch with the "daily_status_table". So first I create a dataset by using a connection to the table "history_daily_track" and I named it "history_daily_track_as_input"
3) Then I create a second dataset that is also connected to the table "history_daily_track", but now I named the dataset as "history_daily_track_as_output"
4) In my case, I wanted to use a python recipe to do the crossmatch. So I create the recipe and give as input "daily_status_table" and "historydaily_track_as_input", and I set as output the already created "history_daily_track_as_output" dataset.
Hope this helps!
-
Looks like your video disappeared @Ignacio_Toledo , but if I understand correctly, this workaroung is connected to an external SQL table.
Could this solution apply to a internal dataset inside my dataiku project ? that I want to update at the start of my flow, to catch old + new parameters inside the flow
I can't find a solution to do it this way,
Thank you,
-
Ignacio_Toledo Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 415 Neuron
Hi @bob
Sadly I don't longer have the video. But yes, you can do a similar thing with filesystem internal datasets, by creating a "new" dataset, and then editing the "Path" (Edit Anyway) to point to the file you would like to change.
I hope this helps!