Weighted Average - Pivot Recipe

bratman
bratman Registered Posts: 3 ✭✭✭

Hello all,

I've been trying to populate Pivot recipe output with weighted averages and failing.

For example: if values are Vx and the corresponding weights are Wx, weighted average is the

(Sum(Vx * Wx)/Sum(Wx)).

In Excel, you create a helper column of the numerator (Vx * Wx) and create a calculated field as Sum(Helper column/Weights).

Example:

Input table:

CategoryValues (Vx)Weights (Wx)Helper Column for Excel (Vx * Wx)
Category 11001.0100
Category 12000.5100
Category 24000.280
Category 21500.8120

Output table:

CategoryWeighted Average CalculationWeighted Output
Category 1(100 + 100)/(1+0.5)133.33
Category 2(80 + 120)/(0.2 + 0.8)200

Any solutions would be greatly appreciated!

Answers

  • Ignacio_Toledo
    Ignacio_Toledo Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 415 Neuron

    Hi @bratman
    ,

    Do you have a sample dataset representing your input, and what is the expected output? I can think of different ways of achieving what you need, but I'm not sure what kind of "pivot" are you doing first.

    Also, if I understand correctly, the weighted average is Sum(Vx * Wx)/Sum(Wx), but you wrote:

    Sum(Sum(Vx * Wx)/Sum(Wx))

    Which looks like a sum of the weighted averages. Is that a final aggregation you do?

    Cheers!

  • bratman
    bratman Registered Posts: 3 ✭✭✭

    Hello @Ignacio_Toledo
    , I have added an example. It looks like I cannot add any computations with more than 1 column involved. Any workaround would be appreciated!

  • Ignacio_Toledo
    Ignacio_Toledo Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 415 Neuron

    Hi @bratman
    ,

    Thanks for providing example data. Here is my solution in a recorded video.

    Basically, there are 3 stages to get from the input to the output:

    1. Prepare recipe to create the column with the "weighted values" for each row (vx * wx)
    2. Group by Category, and sum the weights and weighted values
    3. Prepare recipe to create the column with the final weighted average.

    I hope this helps. There are other options, like using a python recipe, in which case you only need one step. But I guess the idea is to see how to do this with the visual recipes. Maybe other users have a more elegant solution?

    Please let me know if you have any doubts.

  • bratman
    bratman Registered Posts: 3 ✭✭✭

    Hello @Ignacio_Toledo
    , thanks a lot for your answer.

    Is there any way I could calculate the weighted average at the visual recipe level?

    For example, if I create a visualization I need weighted averages of the values (with filters) and Dataiku offers aggregates like count, min, max and average only.

  • Ignacio_Toledo
    Ignacio_Toledo Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 415 Neuron

    Hello @bratman
    .

    My pleasure to help with what I can.

    I'm not sure if I understand correctly your question: do you mean if there is a way to calculate the weighted average in a single "visual recipe", without having to go through the "pivot" recipe?

    Or you are talking about the visualization tools, like the pivot tables?

Setup Info
    Tags
      Help me…