I have a flow that I want to make automatic by using a scenario and I want to replace "last_table" by "new_table" before running the scenario because I have to compare these two tables after each run.
How can we do it ?
Thank's in advance,
Hi @Kenza98 ,
This is a little light on detail since it's unclear why you might need to remove the tables, or where those tables live, or what it means comparing them, etc.
A couple of patterns to look at:
- "append" into the dataset and do the comparison in a group or join recipe?
- keep both tables independent and run each day. One table is"current", the other is "old". Define as part of their definitions or creation process such that data falls into "old" or "current" depending on some time concept or rules applicable.
I hope this helps in the meantime, but as you can see a lot will depend on your infrastructure, goals and desired methodology...
Here are some more details :
- The scenario should be run every month
- In the flow, I have two datasets : one contains the output of the previous month "last_table" and the other dataset contains the output of the current month "new_table".
Now my question is how can I update "last_table" with the content of "new_table" in a scenario (each month) ?
This way, every month I can make sure that both datasets are updated in the flow (one of the current month and the other of the previous month).
Hope it is more clear now 🙂
Thank's for your help !
You could have the flow like this:
Source dataset --> Recipe_1 --> This_month_dataset --> Sync_Recipe --> Last_month_dataset
whereRecipe_1 is either a sync, prepare or filter type but it can really be any type.
Your scenario has two build steps, the first one builds Last_month_dataset and under Build Mode you build "only this dataset", then you build This_month_dataset.
It's important that:
- these are separate scenario build steps and not two datasets in the same build step.
- the Sync_Recipe doesn't append (which is the default)
That would leave you with two identical (in structure) datasets where each run last month is populated with the previous "current month" and then current month is rebuilt.
Of course, this assumes there're no errors of failure in your runs, or that historical comparisons don't need to be rerun, etc. But is a good start to get a feeling for what comparing them might look like.
If you need to think of more complex scenarios, then I would suggest you just append into a single table, but include a column to denote an insertion timestamp column so that you can reproduce later your results.
I hope this makes sense.