## Sign up to take part

Registered users can ask their own questions, contribute to discussions, and be part of the Community!

This website uses cookies. By clicking OK, you consent to the use of cookies. Read our cookie policy.

Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Survey banner

Share your feedback on the Dataiku documentation with this 5 min survey. Thanks! TAKE THE SURVEY

Registered users can ask their own questions, contribute to discussions, and be part of the Community!

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

Solved!

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Using Window recipe and moving average

Hi - I am not sure that I understand how the window recipe works in terms of getting a moving average for the prior 3 months of data and I am hoping this can be clarified for me or there is a solution to what I want to do.

I attached a test_window.csv file that has 3 columns - id, month_year and total_hours.

What I want to do is get an average of the prior three months - see the last column of values I expect to see. Instead, I am not exactly sure what is being calculated (see test_window_prepared_windows.csv).

Thank you in advance for all the help!

id | month_year | total_hours | prior 3 month avg |

E123 | 2018-01-01T00:00:00.000Z | 0 | 0 |

E123 | 2018-02-01T00:00:00.000Z | 2 | 0 |

E123 | 2018-03-01T00:00:00.000Z | 95 | 1 |

E123 | 2018-05-01T00:00:00.000Z | 7.5 | 32 |

E123 | 2018-06-01T00:00:00.000Z | 25 | 35 |

E123 | 2018-07-01T00:00:00.000Z | 63 | 43 |

1 Solution

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Hi @aw30. The window logic, using any kind of tool (not only DSS), always confuses me and I've to go for a trial and error approach to try to understand it.

In this case you need to do a minor modification to your recipe to get the result you want:

So, besides limiting the number of preceding rows, you also had to limit the number of the following rows, and to limit to '-1', so the current row is not taken into account in the average calculated.

Before doing that, and by check the output you provided, the logic of the window recipe went like "group together all entries with the same id and order them by month. Then for each row, take the average of ALL the entries in the group after removing the entries that of rows that are 4 steps or more behind"

So for the first row of group with id E123, the average is taken over all the values (502.5/11 = 45.68), including the value of the first row; moving to the next row nothing changes, so again you get the same value, and the same for the 3rd and 4th row. For the 5th, your window constraint is applied, and the first value is removed, and now the average is 502.5/10 = 50.25, and the 6th row is 500.5/9 = 55.61, and so on.

Hope this helps!

Solutions shown first - Read whole discussion

6 Replies

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Hi @aw30. The window logic, using any kind of tool (not only DSS), always confuses me and I've to go for a trial and error approach to try to understand it.

In this case you need to do a minor modification to your recipe to get the result you want:

So, besides limiting the number of preceding rows, you also had to limit the number of the following rows, and to limit to '-1', so the current row is not taken into account in the average calculated.

Before doing that, and by check the output you provided, the logic of the window recipe went like "group together all entries with the same id and order them by month. Then for each row, take the average of ALL the entries in the group after removing the entries that of rows that are 4 steps or more behind"

So for the first row of group with id E123, the average is taken over all the values (502.5/11 = 45.68), including the value of the first row; moving to the next row nothing changes, so again you get the same value, and the same for the 3rd and 4th row. For the 5th, your window constraint is applied, and the first value is removed, and now the average is 502.5/10 = 50.25, and the 6th row is 500.5/9 = 55.61, and so on.

Hope this helps!

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Hi -thank you for the explanation this works!

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Hi Ignacio,

Thank you for this explanation, like you, I have to do things by trial and error.

I'm trying to use a windows recipe to calculate the 4 week moving average. I set my WINDOW FRAME to limit preceding rows (3) and limit following rows (0). It gives me the expected moving average. However, my first 3 rows still have results. How can I give those rows a value of zero(0) since they aren't using the full 4 weeks in the calculation?

I feel like I have tried everything and I cannot figure it out.

Thanks for your help!

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

My pleasure @aw30 !

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Can I use the built in window logic to get a 7 day rolling average?

To get a rolling average I have been using a Python script but would rather use Dataiku features.

I posted by python script in this thread: https://community.dataiku.com/t5/Using-Dataiku-DSS/How-can-I-compute-an-exponential-moving-average-i...

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Hi @7TonRobot. I would say that yes, of course you could use it! In that case, depending on how you define the 7 day rolling average. From your python script, you are using the .rolling() method with the default options, where according to the pandas documentation "the result is set to the right edge of the window". This means that you should set the "Limiting Preceding rows" to 6 and "Limiting following rows" to 0.

Cheers!

Didn't Find What You Needed?