How to implement a feedback loop on a dataset ?

tomtom · May 2019

Hello,

Each month, I have to compute a dataset that takes the previous month's dataset (M-1) and add some stuff in it.

I wonder how I could to it in Dataiku as for the recipe, I should take the last output dataset (M-1) as the input.

I don't think it is currently possible to produce a feedback-loop in Dataiku: do you confirm ?

How could I achieve my computation with Dataiku ? The "append-only" feature is not a good answer, because before writing anything, I should read the (last month) output dataset to know what will be new in the (current month) output.

Best regards.

Liev · October 2019

Hi tomtom,

The flow interface won't like circular references, which sounds like what you're describing here.

Hence, if you have something like this: Dataset_A -> Recipe -> Dataset_B, one solution to your problem is to define Dataset_C by 'pointing it' to Dataset_B. You can do this in the flow by adding a new Dataset and matching the location (for example SQL table) of Dataset_B. This way you can use as input to your recipe both Dataset_A and Dataset_C (which is in fact the same as Dataset_B).

I hope this is not too confusing!

Carlos_Q · September 2020

Hi @Liev
,

I'm currently facing a similar issue. I need to "update" some data based on previous results. Could you please explain me how to do what you suggested?

Thanks!

P.S. I'm quite new to Dataiku

Marlan · September 2020

Hi @tomtom
,

If you don't mind working in SQL, another option besides the one described by @Liev
is to use a SQL Script code recipe (but to be clear not a SQL Query recipe). SQL Script recipes don't need to have an input dataset and so you could easily do what you describe (albeit entirely in SQL code).

Marlan

Ignacio_Toledo · September 2020

Because English is not my native language, I wouldn't be able to explain in words @Liev
solution in a better way. So, I did the best thing that I could: a recording showing how I solve a similar problem in exactly the same way that Liev mentioned.

The video doesn't have audio, and what I'm doing is:

1) create a dataset connected to a table called "daily_status_table"

2) open a dataset that contains a history of the daily statuses: the idea is to add new information into this dataset ("history_daily_track") by doing some crossmatch with the "daily_status_table". So first I create a dataset by using a connection to the table "history_daily_track" and I named it "history_daily_track_as_input"

3) Then I create a second dataset that is also connected to the table "history_daily_track", but now I named the dataset as "history_daily_track_as_output"

4) In my case, I wanted to use a python recipe to do the crossmatch. So I create the recipe and give as input "daily_status_table" and "historydaily_track_as_input", and I set as output the already created "history_daily_track_as_output" dataset.

Hope this helps!

bob · August 2024

Looks like your video disappeared @Ignacio_Toledo , but if I understand correctly, this workaroung is connected to an external SQL table.

Could this solution apply to a internal dataset inside my dataiku project ? that I want to update at the start of my flow, to catch old + new parameters inside the flow

I can't find a solution to do it this way,

Thank you,

Ignacio_Toledo · August 2024

https://community.dataiku.com/discussion/comment/44203#Comment_44203

Hi @bob

Sadly I don't longer have the video. But yes, you can do a similar thing with filesystem internal datasets, by creating a "new" dataset, and then editing the "Path" (Edit Anyway) to point to the file you would like to change.

I hope this helps!

How to implement a feedback loop on a dataset ?

Answers

Categories

Setup Info

Tags