How to implement a feedback loop on a dataset ?

tomtom
Level 2
How to implement a feedback loop on a dataset ?

Hello,



Each month, I have to compute a dataset that takes the previous month's dataset (M-1) and add some stuff in it.

I wonder how I could to it in Dataiku as for the recipe, I should take the last output dataset (M-1) as the input.



I don't think it is currently possible to produce a feedback-loop in Dataiku: do you confirm ?

How could I achieve my computation with Dataiku ? The "append-only" feature is not a good answer, because before writing anything, I should read the (last month) output dataset to know what will be new in the (current month) output.



 



Best regards.

4 Replies
Liev
Dataiker Alumni
Hi tomtom,

The flow interface won't like circular references, which sounds like what you're describing here.

Hence, if you have something like this: Dataset_A -> Recipe -> Dataset_B, one solution to your problem is to define Dataset_C by 'pointing it' to Dataset_B. You can do this in the flow by adding a new Dataset and matching the location (for example SQL table) of Dataset_B. This way you can use as input to your recipe both Dataset_A and Dataset_C (which is in fact the same as Dataset_B).

I hope this is not too confusing!
Carlos_Q
Level 1

Hi @Liev,

I'm currently facing a similar issue. I need to "update" some data based on previous results. Could you please explain me how to do what you suggested?

Thanks!

P.S. I'm quite new to Dataiku

0 Kudos
Ignacio_Toledo

Because English is not my native language, I wouldn't be able to explain in words @Liev  solution in a better way. So, I did the best thing that I could: a recording showing how I solve a similar problem in exactly the same way that Liev mentioned.

 The video doesn't have audio, and what I'm doing is:

1) create a dataset connected to a table called "daily_status_table"

2) open a dataset that contains a history of the daily statuses: the idea is to add new information into this dataset ("history_daily_track") by doing some crossmatch with the "daily_status_table". So first I create a dataset by using a connection to the table "history_daily_track" and I named it "history_daily_track_as_input"

3) Then I create a second dataset that is also connected to the table "history_daily_track", but now I named the dataset as "history_daily_track_as_output"

4) In my case, I wanted to use a python recipe to do the crossmatch. So I create the recipe and give as input "daily_status_table" and "historydaily_track_as_input", and I set as output the already created "history_daily_track_as_output" dataset.

Hope this helps!

 

Marlan

Hi @tomtom,

If you don't mind working in SQL, another option besides the one described by @Liev is to use a SQL Script code recipe (but to be clear not a SQL Query recipe).  SQL Script recipes don't need to have an input dataset and so you could easily do what you describe (albeit entirely in SQL code).

Marlan 

0 Kudos