## Sign up to take part

Registered users can ask their own questions, contribute to discussions, and be part of the Community!

This website uses cookies. By clicking OK, you consent to the use of cookies. Read our cookie policy.

Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results forย

Survey banner

The Dataiku Community is moving to a new home! Some short term disruption starting next week: LEARN MORE

Registered users can ask their own questions, contribute to discussions, and be part of the Community!

- Community
- ยป
- Switching to Dataiku
- ยป
- Forum
- ยป

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

Solved!

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

How to total a column?

Hello -

I'm trying to calculate the percentage of revenue of the total revenue for each region, which requires me knowing the total revenue.

After all my cleansing and recipes, I'm left with this table:

I now want to calculate each regions percentage contribution to total revenue (adding up all the Revenue_Sum column).

In Alteryx, I'd use the summarize node to calculate the total and then append it back onto the table so that I can then easily do a calculation on each row.

How do I calculate a column total?

I'm not sure why I'm so tripped up on a seemingly simple problem but I think it's because my mind is still stuck a little in the "Alteryx way of doing things"...so I appreciate any help and assistance being new to Dataiku.

Thanks

Chris

1 Solution

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Hi,

Aggregations are not directly done in the Prepare recipe. Instead, they are done either in the "Group" recipe or the "Window" recipe.

- If at the end, you want a dataset with just "Sales Area" and "Total revenue of this sales area", and just one row per Sales Area, use a Group recipe with "group by: Sales Area", and "Aggregate: SUM of Total revenue" (https://knowledge.dataiku.com/latest/courses/basics/group-data/group-data-summary.html)
- If at the end, you want a dataset with as many rows as previously, and just add a column that is the sum of revenue for this sales area (so that for example you can then compute a ratio), use a Window recipe with "partition by: Sales Area", "window: unbounded" and "Aggregate: SUM of Total revenue" (https://knowledge.dataiku.com/latest/courses/visual-recipes/window.html and https://knowledge.dataiku.com/latest/courses/visual-recipes/window/window.html - checkout in particular "add average price as a column)

Your use case seems to be a small variant of the second one: you want a dataset with as many rows as previously, and just add a column that is the sum of revenue for **all **sales areas. For that, you would use a Window recipe "window: unbounded" and "Aggregate: SUM of Total revenue". See how this time we did not "partition" it, so it will compute the global sum instead of a sum per "partition".

Grouping and Windowing are quite powerful and can do a lot of different things, so we recommend reading the linked articles. The big difference to remember is:

- After grouping, your dataset has fewer rows, because all rows have been grouped together
- After windowing, your dataset still has many rows as before, but has additional columns that correspond to aggregations

Hope this helps,

Solutions shown first - Read whole discussion

3 Replies

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Hi,

Aggregations are not directly done in the Prepare recipe. Instead, they are done either in the "Group" recipe or the "Window" recipe.

- If at the end, you want a dataset with just "Sales Area" and "Total revenue of this sales area", and just one row per Sales Area, use a Group recipe with "group by: Sales Area", and "Aggregate: SUM of Total revenue" (https://knowledge.dataiku.com/latest/courses/basics/group-data/group-data-summary.html)
- If at the end, you want a dataset with as many rows as previously, and just add a column that is the sum of revenue for this sales area (so that for example you can then compute a ratio), use a Window recipe with "partition by: Sales Area", "window: unbounded" and "Aggregate: SUM of Total revenue" (https://knowledge.dataiku.com/latest/courses/visual-recipes/window.html and https://knowledge.dataiku.com/latest/courses/visual-recipes/window/window.html - checkout in particular "add average price as a column)

Your use case seems to be a small variant of the second one: you want a dataset with as many rows as previously, and just add a column that is the sum of revenue for **all **sales areas. For that, you would use a Window recipe "window: unbounded" and "Aggregate: SUM of Total revenue". See how this time we did not "partition" it, so it will compute the global sum instead of a sum per "partition".

Grouping and Windowing are quite powerful and can do a lot of different things, so we recommend reading the linked articles. The big difference to remember is:

- After grouping, your dataset has fewer rows, because all rows have been grouped together
- After windowing, your dataset still has many rows as before, but has additional columns that correspond to aggregations

Hope this helps,

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Hello - Wow, thank you for that detailed explanation and the links. It's opened my eyes to a bunch of new possibilities!

You've perfectly identified my use case with the suggestion of the window being unbounded so I can add a column that is the sum of revenue for all sales areas.

However, I seem to be only able to get the CUMULATIVE sum, and not the Total sum for all rows:

Here is my Window definition:

Here is my aggregation:

And with that, the output actually adds a cumulative sum:

So it's close...but I would need the total sum on each row.

Any pointers on what I'm doing wrong is greatly appreciated.

Thanks again

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Didn't Find What You Needed?