Generating rows and 3 months avg. value

ShrimpMania · September 2023

I used window visual recipes in order to get 3 months avg. value but the output wasn't what I expected. I want to get the exact 3 months avg. value but in the window recipe it calculates recent 3months with only available date so if it doesn't have data in the recent 3 months then it doesn't work.

e.g. for the 2nd row of the Input, as there are no Feb and March data, it still shows 3 months avg value is 440 but I was expecting it to be (0+0+440) / 3 = 146.66.

I have 2000ish rows including date from 2020 and the data will be getting bigger as it's generated every month. What I need to do is

1. all the columns except date and value should be grouped

2. if there's no data then it should be filled with 0

3. Need to get exact 3 months avg. value.

How can I make it? I'd prefer to have it done via SQL code, if it's better to be done via python then it would be good as well.

Input:

specialty code	product	company	type	date	value
A	chocolate	m&m's	grocery	2020-01-01T00:00:00.000Z	3300
A	candy	m&m's	grocery	2020-04-01T00:00:00.000Z	440
B	pasta	molly	grocery	2020-01-01T00:00:00.000Z	1100
B	pasta	molly	grocery	2020-03-01T00:00:00.000Z	800
E	laptop	hp	electric	2020-01-01T00:00:00.000Z	8800
E	laptop	hp	electric	2020-02-01T00:00:00.000Z	11000
E	laptop	hp	electric	2020-03-01T00:00:00.000Z	8800
E	laptop	hp	electric	2020-04-01T00:00:00.000Z	7000

Expected Output:

specialty code	product	company	type	date	value	3mth avg.
A	chocolate	m&m's	grocery	2020-01-01T00:00:00.000Z	3300	1100
A	chocolate	m&m's	grocery	2020-02-01T00:00:00.000Z	0	1100
A	chocolate	m&m's	grocery	2020-03-01T00:00:00.000Z	0	1100
A	chocolate	m&m's	grocery	2020-04-01T00:00:00.000Z	0	0
A	candy	m&m's	grocery	2020-01-01T00:00:00.000Z	0	0
A	candy	m&m's	grocery	2020-02-01T00:00:00.000Z	0	0
A	candy	m&m's	grocery	2020-03-01T00:00:00.000Z	0	0
A	candy	m&m's	grocery	2020-04-01T00:00:00.000Z	440	146.666667
B	pasta	molly	grocery	2020-01-01T00:00:00.000Z	1100	366.666667
B	pasta	molly	grocery	2020-02-01T00:00:00.000Z	0	366.666667
B	pasta	molly	grocery	2020-03-01T00:00:00.000Z	800	633.333333
B	pasta	molly	grocery	2020-04-01T00:00:00.000Z	0	266.666667
E	laptop	hp	electric	2020-01-01T00:00:00.000Z	8800	2933.33333
E	laptop	hp	electric	2020-02-01T00:00:00.000Z	11000	6600
E	laptop	hp	electric	2020-03-01T00:00:00.000Z	8800	9533.33333
E	laptop	hp	electric	2020-04-01T00:00:00.000Z	7000	8933.33333

Turribeach · September 2023

I am not sure if the Window Recipe can deal with missing rows. But this would work if you do data densification, in other words fill the gaps that your data has in SQL.

Build a select statement with a distinct list of all products:

SELECT DISTINCT specialty, code, product, company, type FROM some_table

Now build a select statement with all the dates using the min and max from your column date as driver (ie 2020-01-01, 2020-01-02, 2020-01-03, 2020-01-04, etc). I can't tell you how to do this as I don't know which SQL technology you are using but it's easy to do on most SQL technologies.

Then do a cartisian join from the two select statements above adding also a dummy column called value with 0 as the value. So now you should have a query that returns every single product combination for every single date and there are no missing gaps.

Finally left join the "densification" query with your original query using all your product attributes (specialty, code, product, company, type and date) doing a group by and sum(value) to remove the 0 value rows where you already have an actual value. Finally use the Window recipe as above and it should work fine now.

Generating rows and 3 months avg. value

Best Answer

Categories

Setup Info

Tags