How to implement a feedback loop on a dataset ?

tomtom · ‎05-08-2019

Hello,

Each month, I have to compute a dataset that takes the previous month's dataset (M-1) and add some stuff in it.

I wonder how I could to it in Dataiku as for the recipe, I should take the last output dataset (M-1) as the input.

I don't think it is currently possible to produce a feedback-loop in Dataiku: do you confirm ?

How could I achieve my computation with Dataiku ? The "append-only" feature is not a good answer, because before writing anything, I should read the (last month) output dataset to know what will be new in the (current month) output.

Best regards.

Liev · ‎10-15-2019

Hi tomtom,

The flow interface won't like circular references, which sounds like what you're describing here.

Hence, if you have something like this: Dataset_A -> Recipe -> Dataset_B, one solution to your problem is to define Dataset_C by 'pointing it' to Dataset_B. You can do this in the flow by adding a new Dataset and matching the location (for example SQL table) of Dataset_B. This way you can use as input to your recipe both Dataset_A and Dataset_C (which is in fact the same as Dataset_B).

I hope this is not too confusing!

Carlos_Q · ‎09-11-2020

Hi @Liev,

I'm currently facing a similar issue. I need to "update" some data based on previous results. Could you please explain me how to do what you suggested?

Thanks!

P.S. I'm quite new to Dataiku

Ignacio_Toledo · ‎09-12-2020

Because English is not my native language, I wouldn't be able to explain in words @Liev solution in a better way. So, I did the best thing that I could: a recording showing how I solve a similar problem in exactly the same way that Liev mentioned.

The video doesn't have audio, and what I'm doing is:

1) create a dataset connected to a table called "daily_status_table"

2) open a dataset that contains a history of the daily statuses: the idea is to add new information into this dataset ("history_daily_track") by doing some crossmatch with the "daily_status_table". So first I create a dataset by using a connection to the table "history_daily_track" and I named it "history_daily_track_as_input"

3) Then I create a second dataset that is also connected to the table "history_daily_track", but now I named the dataset as "history_daily_track_as_output"

4) In my case, I wanted to use a python recipe to do the crossmatch. So I create the recipe and give as input "daily_status_table" and "historydaily_track_as_input", and I set as output the already created "history_daily_track_as_output" dataset.

Hope this helps!

Marlan · ‎09-11-2020

Hi @tomtom,

If you don't mind working in SQL, another option besides the one described by @Liev is to use a SQL Script code recipe (but to be clear not a SQL Query recipe). SQL Script recipes don't need to have an input dataset and so you could easily do what you describe (albeit entirely in SQL code).

Marlan

Sign up to take part

How to implement a feedback loop on a dataset ?

How to implement a feedback loop on a dataset ?

Labels