replace table before running scenario

Solved!
Kenza98
Level 2
replace table before running scenario

Hello everyone,

I have a flow that I want to make automatic by using a scenario and I want to replace "last_table" by "new_table" before running the scenario because I have to compare these two tables after each run.

How can we do it ?

Thank's in advance,

Kenza

1 Solution
Liev
Dataiker Alumni

You could have the flow like this:

Source dataset --> Recipe_1 --> This_month_dataset --> Sync_Recipe --> Last_month_dataset

whereRecipe_1 is either a sync, prepare or filter type but it can really be any type.

Your scenario has two build steps, the first one builds Last_month_dataset and under Build Mode you build "only this dataset", then you build This_month_dataset.

It's important that:

- these are separate scenario build steps and not two datasets in the same build step.

- the Sync_Recipe doesn't append (which is the default)

That would leave you with two identical (in structure) datasets where each run last month is populated with the previous "current month" and then current month is rebuilt.

Of course, this assumes there're no errors of failure in your runs, or that historical comparisons don't need to be rerun, etc. But is a good start to get a feeling for what comparing them might look like.

If you need to think of more complex scenarios, then I would suggest you just append into a single table, but include a column to denote an insertion timestamp column so that you can reproduce later your results.

I hope this makes sense.

View solution in original post

5 Replies
Kenza98
Level 2
Author

I can see the step 'Clear' so I can clear data from "last_table" by how can I fill it with "new_table" content before running the scenario ?

Liev
Dataiker Alumni

Hi @Kenza98 ,

This is a little light on detail since it's unclear why you might need to remove the tables, or where those tables live, or what it means comparing them, etc.

A couple of patterns to look at:

- "append" into the dataset and do the comparison in a group or join recipe?

- keep both tables independent and run each day. One table is"current", the other is "old". Define as part of their definitions or creation process such that data falls into "old" or "current" depending on some time concept or rules applicable.

I hope this helps in the meantime, but as you can see a lot will depend on your infrastructure, goals and desired methodology...

Kenza98
Level 2
Author

Here are some more details :

- The scenario should be run every month

- In the flow, I have two datasets : one contains the output of the previous month "last_table" and the other dataset contains the output of the current month "new_table".

Now my question is how can I update "last_table" with the content of "new_table" in a scenario (each month) ?

This way, every month I can make sure that both datasets are updated in the flow (one of the current month and the other of the previous month).

Hope it is more clear now ๐Ÿ™‚ 

Thank's for your help !

0 Kudos
Liev
Dataiker Alumni

You could have the flow like this:

Source dataset --> Recipe_1 --> This_month_dataset --> Sync_Recipe --> Last_month_dataset

whereRecipe_1 is either a sync, prepare or filter type but it can really be any type.

Your scenario has two build steps, the first one builds Last_month_dataset and under Build Mode you build "only this dataset", then you build This_month_dataset.

It's important that:

- these are separate scenario build steps and not two datasets in the same build step.

- the Sync_Recipe doesn't append (which is the default)

That would leave you with two identical (in structure) datasets where each run last month is populated with the previous "current month" and then current month is rebuilt.

Of course, this assumes there're no errors of failure in your runs, or that historical comparisons don't need to be rerun, etc. But is a good start to get a feeling for what comparing them might look like.

If you need to think of more complex scenarios, then I would suggest you just append into a single table, but include a column to denote an insertion timestamp column so that you can reproduce later your results.

I hope this makes sense.

Kenza98
Level 2
Author

Oh great !! I was thinking about the sync recipe as a possible solution and this is what you said. 

Thank you so much for your help ๐Ÿ™‚ 

Have a great day !

0 Kudos