Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Hello Dataiku,
Can anyone advise, how I can get past 7 days rolling data in windows or group by recipe? My data does not have sequential dates, but I need a count for different column (say id) for seven previous sequential date? Suppose there are no records in 7 days frame, give me zero.
Can you please post an Excel document with some dummy data showing what you want calculated? See this thread for how this helps people get help.
Hi, thanks for your reply:
So basically here is the requirement.
Take end date and go back sequential 7 back grouped by user and pull the count of values in column 'Data'.
(for example in below table: for user-039X, going back from april 26, I need 7 days back which is till April 19, count of values of "DATA" column. So only in 26th April, there is one value of User 039X which is AA118. So the new column (say count) would be 1.
User | PART_DATE | Data |
039X | 26-Apr-24 | AA118 |
039X | 15-Apr-24 | AA117 |
039X | 12-Apr-24 | AA116 |
039X | 10-Apr-24 | AA115 |
039X | 5-Apr-24 | AA114 |
039X | 3-Feb-24 | AA113 |
039X | 24-Jan-24 | AA112 |
039X | 23-Jan-24 | AA111 |
039X | 18-Jan-24 | AA110 |
039X | 15-Jan-24 | AA109 |
039X | 15-Nov-23 | AA107 |
039X | 15-Nov-23 | AA108 |
039X | 8-Nov-23 | AA106 |
039X | 7-Nov-23 | AA105 |
Answer is:
rolling part-date | User | Count of Data |
26 April 2024 - 19 April 2024 | 039X | 1 |
25 April 2024-18 April 2024 | 039X | 0 |
24 April 2024-17 April 2024 | 039X | 0 |
23 April 2024-16 April 2024 | 039X | 0 |
22 April 2024-15 April 2024 | 039X | 1 |
How I can get within prepare recipe or window recipe
You can easily achieve this in a Window recipe but you need to meet some requirements first. First you need to make sure you are using actual dates data types so use a prepare recipe to convert your field to a native date. Since you want to produce an output which has dates your data doesn't have you will need to fill in the gaps, this is called data densification. There are many ways of doing that but in a nutshell if you get a distinct list of all your users and do a cartisian join with a table that has all the dates sequentially you will end up with a table that has all Users against all Dates. Then you join it back to your actual dataset joining by User and Date to remove the duplicates (as some Users have data on some Dates). I posted about this here and here.
You will end up with a dataset like this:
Now it's easy to build a Window recipe as follows:
And here is how the output looks like: