Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
I have a workflow in Alteryx that uses the multi-row formula tool to create a new column called "Activate_DateTime". I'm struggling to replicate this in Dataiku.
Here is my formula in Alteryx:
ELSE [Row-1:Activate_DateTime] ENDIF
I'm guessing I will have to use the "Windows" recipe, but even after reading about it, I don't fully understand how to utilize that recipe. Any help in re-creating this is greatly appreciated, thank you!
Operating system used: Windows
Hi @FarSideFeb ,
Your guess is correct, the Window recipe will be what you'll want to use here! In this case, I'm assuming that you want one window for your entire dataset that doesn't use partitioning columns.
LMK if this helps!
Hey @AshleyW ,
Thank you so much for the response, the way you laid out your explanation is great! I'm new to the Windows recipe so forgive me, but how do I create a new column containing the previous value of Activate_DateTime in the "Aggregations" step?
It's going to be on the 'Aggregations' step of the Window recipe where you can retrieve the original value, add a sum/min/max etc. The lag, lag diff, lead, and lead diff options are on the right; they're not aggregations, but they're available in that particular step. Click on the lag/lead etc buttons and specify the number of rows you want to use as the offset. In this case, that'll be 1 since you're retrieving the prior row's value.
I like to use a Window recipe to retrieve all of the lag/lead offsets I might need in one go: when you specify the number of rows, you can use multiple. For example, if you input 1,2 in the lag offset box, you'll end up with two new columns: one containing the value of the preview row, the other containing the value 2 rows prior. The ability to calculate the difference using various date parts is also pretty neat.
In this example from a Window Recipe tutorial, they've selected lag and lag diff 1. This means that once the recipe is run, they'll have a new column purchase_date_lag (the value of purchase date in the prior row) and purchase_date_lag_diff (the difference in days between the current value and the prior value of purchase date)