Weighted Average - Pivot Recipe
Hello all,
I've been trying to populate Pivot recipe output with weighted averages and failing.
For example: if values are Vx and the corresponding weights are Wx, weighted average is the
(Sum(Vx * Wx)/Sum(Wx)).
In Excel, you create a helper column of the numerator (Vx * Wx) and create a calculated field as Sum(Helper column/Weights).
Example:
Input table:
Category | Values (Vx) | Weights (Wx) | Helper Column for Excel (Vx * Wx) |
Category 1 | 100 | 1.0 | 100 |
Category 1 | 200 | 0.5 | 100 |
Category 2 | 400 | 0.2 | 80 |
Category 2 | 150 | 0.8 | 120 |
Output table:
Category | Weighted Average Calculation | Weighted Output |
Category 1 | (100 + 100)/(1+0.5) | 133.33 |
Category 2 | (80 + 120)/(0.2 + 0.8) | 200 |
Any solutions would be greatly appreciated!
Answers
-
Ignacio_Toledo Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 415 Neuron
Hi @bratman
,Do you have a sample dataset representing your input, and what is the expected output? I can think of different ways of achieving what you need, but I'm not sure what kind of "pivot" are you doing first.
Also, if I understand correctly, the weighted average is Sum(Vx * Wx)/Sum(Wx), but you wrote:
Sum(Sum(Vx * Wx)/Sum(Wx))
Which looks like a sum of the weighted averages. Is that a final aggregation you do?
Cheers!
-
Hello @Ignacio_Toledo
, I have added an example. It looks like I cannot add any computations with more than 1 column involved. Any workaround would be appreciated! -
Ignacio_Toledo Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 415 Neuron
Hi @bratman
,Thanks for providing example data. Here is my solution in a recorded video.
Basically, there are 3 stages to get from the input to the output:
- Prepare recipe to create the column with the "weighted values" for each row (vx * wx)
- Group by Category, and sum the weights and weighted values
- Prepare recipe to create the column with the final weighted average.
I hope this helps. There are other options, like using a python recipe, in which case you only need one step. But I guess the idea is to see how to do this with the visual recipes. Maybe other users have a more elegant solution?
Please let me know if you have any doubts.
-
Hello @Ignacio_Toledo
, thanks a lot for your answer.Is there any way I could calculate the weighted average at the visual recipe level?
For example, if I create a visualization I need weighted averages of the values (with filters) and Dataiku offers aggregates like count, min, max and average only.
-
Ignacio_Toledo Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 415 Neuron
Hello @bratman
.My pleasure to help with what I can.
I'm not sure if I understand correctly your question: do you mean if there is a way to calculate the weighted average in a single "visual recipe", without having to go through the "pivot" recipe?
Or you are talking about the visualization tools, like the pivot tables?