Create running number in Window recipe
Hi community,
I have a dataset that contains multiple line for each id like this:
id | Date |
1234 | 01/11/2014 |
1234 | 01/12/2014 |
1234 | 01/09/2015 |
1234 | 01/12/2015 |
1234 | 01/11/2018 |
I then used a window recipe to calculate the date difference by number of month between each line with a custom aggregation and then I have this:
id | Date | Difference |
1234 | 01/11/2014 | |
1234 | 01/12/2014 | 1 |
1234 | 01/09/2015 | 9 |
1234 | 01/12/2015 | 3 |
1234 | 01/11/2018 | 35 |
Now, I want to create another column to define each run of the Id. The run is the same if the difference is not more than six months.
The result should look like this:
id | Date | Difference | Run |
1234 | 01/11/2014 | 1 | |
1234 | 01/12/2014 | 1 | 1 |
1234 | 01/09/2015 | 9 | 2 |
1234 | 01/12/2015 | 3 | 2 |
1234 | 01/11/2018 | 35 | 3 |
Can someone help me to achieve this? (Ideally with visual recipe and custom_aggregation in SQL and without Python).
Can I achieve this in the same Window Recipe that calculate the 'Difference' column?
Thanks a lot for your help.
Best regards,
Operating system used: Window
Answers
-
Hi,
given that you're using a special rule to compute whether an increment in the running number is needed, this won't be doable in a single recipe.
You'll need
- a first Window recipe like the one you have now, to compute the difference
- a Prepare recipe on the output of the first Window recipe, to convert difference into 0 or 1 depending on whether the difference is more than 6 months or not (something like a Formula step with `if(difference > 6, 1, 0)`
- a second Window recipe, where you partition and order like in the first recipe, and put the frame to "limit following -> 0" and no limit preceding, with a "Sum" aggregate on the 0/1 column
-
luongnguyen Partner, Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 2 Partner
Hi,
Thanks a lot for your response.
I think that makes sense.
I thought about that solution also but it seems a bit heavy as manipulation.
I can't skip the extre Prepare Recipe by using the computed_columns directly in the second Window Recipe and it works.
It's a shame we can do it in the same recipe. I think I can do it in the same SQL recipe by using nested calculation but it will be more complicate after for the maintenance.
Thanks a lot,