How to total a column?

Solved!
cdutoit
Level 3
How to total a column?

Hello -

I'm trying to calculate the percentage of revenue of the total revenue for each region, which requires me knowing the total revenue.

After all my cleansing and recipes, I'm left with this table:

Screen Shot 2021-09-19 at 4.48.40 PM.png

 

I now want to calculate each regions percentage contribution to total revenue (adding up all the Revenue_Sum column).

In Alteryx, I'd use the summarize node to calculate the total and then append it back onto the table so that I can then easily do a calculation on each row.

How do I calculate a column total?

I'm not sure why I'm so tripped up on a seemingly simple problem but I think it's because my mind is still stuck a little in the "Alteryx way of doing things"...so I appreciate any help and assistance being new to Dataiku.

 

Thanks

Chris

 

0 Kudos
1 Solution
Clรฉment_Stenac

Hi,

Aggregations are not directly done in the Prepare recipe. Instead, they are done either in the "Group" recipe or the "Window" recipe.

Your use case seems to be a small variant of the second one: you want a dataset with as many rows as previously, and just add a column that is the sum of revenue for all sales areas. For that, you would use a Window recipe "window: unbounded" and "Aggregate: SUM of Total revenue". See how this time we did not "partition" it, so it will compute the global sum instead of a sum per "partition".

Grouping and Windowing are quite powerful and can do a lot of different things, so we recommend reading the linked articles. The big difference to remember is:

  • After grouping, your dataset has fewer rows, because all rows have been grouped together
  • After windowing, your dataset still has many rows as before, but has additional columns that correspond to aggregations

Hope this helps,

View solution in original post

3 Replies
Clรฉment_Stenac

Hi,

Aggregations are not directly done in the Prepare recipe. Instead, they are done either in the "Group" recipe or the "Window" recipe.

Your use case seems to be a small variant of the second one: you want a dataset with as many rows as previously, and just add a column that is the sum of revenue for all sales areas. For that, you would use a Window recipe "window: unbounded" and "Aggregate: SUM of Total revenue". See how this time we did not "partition" it, so it will compute the global sum instead of a sum per "partition".

Grouping and Windowing are quite powerful and can do a lot of different things, so we recommend reading the linked articles. The big difference to remember is:

  • After grouping, your dataset has fewer rows, because all rows have been grouped together
  • After windowing, your dataset still has many rows as before, but has additional columns that correspond to aggregations

Hope this helps,

cdutoit
Level 3
Author

Hello - Wow, thank you for that detailed explanation and the links. It's opened my eyes to a bunch of new possibilities!

You've perfectly identified my use case with the suggestion of the window being unbounded so I can add a column that is the sum of revenue for all sales areas.

However, I seem to be only able to get the CUMULATIVE sum, and not the Total sum for all rows:

Here is my Window definition:

 

Screen Shot 2021-09-20 at 6.49.53 AM.png

Here is my aggregation:

Screen Shot 2021-09-20 at 6.51.18 AM.png

And with that, the output actually adds a cumulative sum:

Screen Shot 2021-09-20 at 6.50.55 AM.png

So it's close...but I would need the total sum on each row.

Any pointers on what I'm doing wrong is greatly appreciated.

Thanks again

0 Kudos
cdutoit
Level 3
Author

Update: Thanks @Clรฉment_Stenac I got it to work by toggling the Window Frame option. While I don't fully understand the explanation, this post by @fchataigner2 references the issue. I have some more reading to do to fully grasp this but it seems like it is working now.

0 Kudos