Check out the first Dataiku 8 Deep Dive focusing on Productivity on October 29th Read More

Issue with average calculation in Window

Level 3
Issue with average calculation in Window

Hi everyone,

As said in the title, I have an issue to get a calculation of average for the parameter  "Spending" in a Window calculation.

It is an example of list of customers, with dates of visits and spending. I have previously calculated for each customer the visit number ordered in time ("Visit_number"), and binned this parameter with a width of 2 ("Visit_number_bin"). The dataset before the Window step looks as shown below :

Capture10.JPG

With the Window calculation, I aim to calculate the average spending for a partition defined by Customer / Visit_number_bin.

The Window calculation is set as shown below :

Capture11.JPG

Capture12.JPG

But the result I get is a moving average of spending in each partition, rather than simply the spending, as illustrated below. For example, the lines 2 and 3 are one partition, and I would expect the spending average to be 6.5 for both these lines, but it is 9 on the first line, and 6.5 on the second line. I do not understand why I get a moving average in the results, as I set the "Window frame" to off.

Does anyone have an idea how to set the Window calculation to get the average for each group?

Thanks in advance for your help.

Olivier

Capture13.JPG

 

 

0 Kudos
2 Replies
Dataiker
Dataiker

Hi,

the default behavior for windows is to be "unbounded preceding and 0 following", so that if you do a sum() you get a running sum.

To have the average per group on each row, you need to activate the window frame and leave the limits unset, so as to explicitely request "unbounded preceding and unbounded following" (you can give any ordering column since it won't impact the avg() )

Level 3
Author

Hi @fchataigner2 ,

Thanks for your quick reply, it is very clear, and I would never have guessed!

Have a nice day,

Olivier

0 Kudos