Issue with average calculation in Window

OlivierW
OlivierW Registered Posts: 22 ✭✭✭✭

Hi everyone,

As said in the title, I have an issue to get a calculation of average for the parameter "Spending" in a Window calculation.

It is an example of list of customers, with dates of visits and spending. I have previously calculated for each customer the visit number ordered in time ("Visit_number"), and binned this parameter with a width of 2 ("Visit_number_bin"). The dataset before the Window step looks as shown below :

Capture10.JPG

With the Window calculation, I aim to calculate the average spending for a partition defined by Customer / Visit_number_bin.

The Window calculation is set as shown below :

Capture11.JPG

Capture12.JPG

But the result I get is a moving average of spending in each partition, rather than simply the spending, as illustrated below. For example, the lines 2 and 3 are one partition, and I would expect the spending average to be 6.5 for both these lines, but it is 9 on the first line, and 6.5 on the second line. I do not understand why I get a moving average in the results, as I set the "Window frame" to off.

Does anyone have an idea how to set the Window calculation to get the average for each group?

Thanks in advance for your help.

Olivier

Capture13.JPG

Best Answer

  • fchataigner2
    fchataigner2 Dataiker Posts: 355 Dataiker
    Answer ✓

    Hi,

    the default behavior for windows is to be "unbounded preceding and 0 following", so that if you do a sum() you get a running sum.

    To have the average per group on each row, you need to activate the window frame and leave the limits unset, so as to explicitely request "unbounded preceding and unbounded following" (you can give any ordering column since it won't impact the avg() )

Answers

Setup Info
    Tags
      Help me…