The Dataiku Frontrunner Awards have just launched to recognize your achievements! Submit Your Entry

window sum around date

Solved!
DavidALI
Level 2
window sum around date

Hi, 

I have a table with columns Name , Birth_Date ( year) , count

Name         Birth_Date        count

Richard       1978                  3

Richard      1981                   2

Mary           1956                   2

I would like to calculate for each name the sum of count for the Birth_date and during a period of five years before and five years after the Birth_Date ( for example : how many Richard are born during the period 1973-1983 )

The results would be something like :

Name         Birth_Date        count      Birth_date_frame_time_10         count_10_years_frame_time

Richard       1978                  3                   1973-1983                                          39

Richard      1981                   5                   1976-1986                                           23

Mary           1956                   4                   1951-1961                                          34

 

I tried the window recipe, partition by name, order by Birth_date, window frame ( limit window on a value of range of the ordered column, use lower bound : - 5 , use upper bound : 5)

but it results in count_10_years_frame_time wich are sometimes lower of  count , like that : 

Name         Birth_Date        count      Birth_date_frame_time_10          count_10_years_frame_time

Richard       1978                  3                      1973-1983                                          2

Richard      1981                   5                      1976-1986                                          20

Mary           1956                   4                       1951-1961                                          3

Thank you for your precious help

David

 

 

 

 

0 Kudos
1 Solution
fchataigner2
Dataiker
Dataiker

Hi,

using lower bound = 5 and upper bound = 5 translates into "all rows where d - 5 <= Birth_Date <= d + 5 ,  with d being the value of Birth_Date for the row being processed", so it's a window of 11 years actually. Note that lower bound = -5 is also valid and translated into "all rows where d - (-5) <= Birth_Date <= d + 5" which is actually "all rows where Birth_Date = d + 5" 

View solution in original post

0 Kudos
2 Replies
fchataigner2
Dataiker
Dataiker

Hi,

using lower bound = 5 and upper bound = 5 translates into "all rows where d - 5 <= Birth_Date <= d + 5 ,  with d being the value of Birth_Date for the row being processed", so it's a window of 11 years actually. Note that lower bound = -5 is also valid and translated into "all rows where d - (-5) <= Birth_Date <= d + 5" which is actually "all rows where Birth_Date = d + 5" 

View solution in original post

0 Kudos
DavidALI
Level 2
Author

Thank you !

I would never had guessed alone that the minus is implicit in the lower bound parameter.

 

0 Kudos
A banner prompting to get Dataiku DSS