Alteryx Multi-Row Formula to Dataiku?

FarSideFeb
Level 3
Alteryx Multi-Row Formula to Dataiku?

I have a workflow in Alteryx that uses the multi-row formula tool to create a new column called "Activate_DateTime". I'm struggling to replicate this in Dataiku.

Here is my formula in Alteryx: 
     IFContains([Prov_Type_Desc], "ACTIVATE")

     THEN [DateTime]

      ELSE [Row-1:Activate_DateTime] ENDIF

I'm guessing I will have to use the "Windows" recipe, but even after reading about it, I don't fully understand how to utilize that recipe. Any help in re-creating this is greatly appreciated, thank you!


Operating system used: Windows

0 Kudos
3 Replies
AshleyW
Dataiker

Hi @FarSideFeb ,

Your guess is correct, the Window recipe will be what you'll want to use here! In this case, I'm assuming that you want one window for your entire dataset that doesn't use partitioning columns.

  • Create a Window Recipe
  • In the window definition step:
    • Don't define a partition
    • If the rows in your dataset need to be ordered in a specific way since you're retrieving a lag/lead value, do it under 'Order Columns'.
    • Leave the window frame turned off
  • In the aggregation step:
    • create a new column containing the previous value of Activate_DateTime. I'm guessing "Row-1: Activate_DateTime" is a lag1 of Activate_DateTime retrieving the prior row's value of Activate_DateTime
  • Run the recipe
  • Use a Prepare with a formula  containing the logic you've described above, if(contains([Prov_Type_Desc], "ACTIVATE", [DateTime], [Activate_DateTime_lag]

LMK if this helps!

Cheers, 

Ashley

0 Kudos
FarSideFeb
Level 3
Author

Hey @AshleyW ,

Thank you so much for the response, the way you laid out your explanation is great! I'm new to the Windows recipe so forgive me, but how do I create a new column containing the previous value of Activate_DateTime in the "Aggregations" step?

0 Kudos
AshleyW
Dataiker

Hi,

It's going to be on the 'Aggregations' step of the Window recipe where you can retrieve the original value, add a sum/min/max etc. The lag, lag diff, lead, and lead diff options are on the right; they're not aggregations, but they're available in that particular step. Click on the lag/lead etc buttons and specify the number of rows you want to use as the offset. In this case, that'll be 1 since you're retrieving the prior row's value.

I like to use a Window recipe to retrieve all of the lag/lead offsets I might need in one go: when you specify the number of rows, you can use multiple. For example, if you input 1,2 in the lag offset box, you'll end up with two new columns: one containing the value of the preview row, the other containing the value 2 rows prior. The ability to calculate the difference using various date parts is also pretty neat.

In this example from a Window Recipe tutorial, they've selected lag and lag diff 1. This means that once the recipe is run, they'll have a new column purchase_date_lag (the value of purchase date in the prior row) and purchase_date_lag_diff (the difference in days between the current value and the prior value of purchase date)

merchant-failures-aggregations.png

Cheers,

Ashley

 

0 Kudos