Create running number in Window recipe

luongnguyen
luongnguyen Partner, Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 2 Partner

Hi community,

I have a dataset that contains multiple line for each id like this:

idDate
123401/11/2014
123401/12/2014
123401/09/2015
123401/12/2015
123401/11/2018

I then used a window recipe to calculate the date difference by number of month between each line with a custom aggregation and then I have this:

idDateDifference
123401/11/2014
123401/12/20141
123401/09/20159
123401/12/20153
123401/11/201835

Now, I want to create another column to define each run of the Id. The run is the same if the difference is not more than six months.

The result should look like this:

idDateDifferenceRun
123401/11/2014 1
123401/12/201411
123401/09/201592
123401/12/201532
123401/11/2018353

Can someone help me to achieve this? (Ideally with visual recipe and custom_aggregation in SQL and without Python).

Can I achieve this in the same Window Recipe that calculate the 'Difference' column?

Thanks a lot for your help.

Best regards,


Operating system used: Window

Answers

  • fchataigner2
    fchataigner2 Dataiker Posts: 355 Dataiker

    Hi,

    given that you're using a special rule to compute whether an increment in the running number is needed, this won't be doable in a single recipe.

    You'll need

    - a first Window recipe like the one you have now, to compute the difference

    - a Prepare recipe on the output of the first Window recipe, to convert difference into 0 or 1 depending on whether the difference is more than 6 months or not (something like a Formula step with `if(difference > 6, 1, 0)`

    - a second Window recipe, where you partition and order like in the first recipe, and put the frame to "limit following -> 0" and no limit preceding, with a "Sum" aggregate on the 0/1 column

  • luongnguyen
    luongnguyen Partner, Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 2 Partner

    Hi,

    Thanks a lot for your response.

    I think that makes sense.

    I thought about that solution also but it seems a bit heavy as manipulation.

    I can't skip the extre Prepare Recipe by using the computed_columns directly in the second Window Recipe and it works.

    It's a shame we can do it in the same recipe. I think I can do it in the same SQL recipe by using nested calculation but it will be more complicate after for the maintenance.

    Thanks a lot,

Setup Info
    Tags
      Help me…