DSS sum and group by

Solved!
emate
Level 5
DSS sum and group by

Hi All,

I have a dataset [attached example] and I want to create 2 tables out of this;

1st - sum(sales) by market and exclude duplicated rows, so  I want to end up with Sales for each market in specific date rage (Data column) but exclude duplicated - I have them because 1 product can be in more than 1 group 

So  first table would look like:

market sales data
MRCC I 180   43831... etc

Then second table I would like to look like above one, but add as a 'dictionary' aditionall column with uniqe product name within Market and Date, so for MRCC I it would look like:

market product sales
MRCC I ELI 180
MRCC I SUT 180

Can I achive this with only visual recipes? Or I need some SQL part of code? if yes could you give me a hint on a syntax 'structure' So i can replicate this in my case?

I was trying to use group by, but I couldn't remove those duplicated product rows.

Thanks in advance for any kind of help,

eM

1 Solution
JacquelineK
Dataiker

Hi @emate,

In the attached PDF (containing screenshots) and the paragraphs below, I've outlined a couple things you can try in DSS that may answer your questions.

For the first table, you can use the Group recipe (as you mentioned) using โ€œmarketโ€ and โ€œdataโ€ as the Group Keys and  โ€œsumโ€ sales in the Per Field Aggregations window. To remove duplicates in the Group recipe, you can use the Post-filter to keep only Distinct rows. Additionally, in Pre-filter you can remove duplicates before doing the groupby.

For the second table, you can use a Window recipe. Select โ€œmarketโ€ and โ€œdataโ€ as the partitioning columns (the fields you will use to group the dataset), and โ€œsumโ€ sales in Aggregations. Finally, remove duplicates in Post-filter. Window recipes are fantastic and are quite useful for many things. This page has more info if you're interested: https://academy.dataiku.com/latest/tutorial/visual-recipes/window.html

Hope that helps!

Jacqueline

View solution in original post

5 Replies
JacquelineK
Dataiker

Hi @emate,

In the attached PDF (containing screenshots) and the paragraphs below, I've outlined a couple things you can try in DSS that may answer your questions.

For the first table, you can use the Group recipe (as you mentioned) using โ€œmarketโ€ and โ€œdataโ€ as the Group Keys and  โ€œsumโ€ sales in the Per Field Aggregations window. To remove duplicates in the Group recipe, you can use the Post-filter to keep only Distinct rows. Additionally, in Pre-filter you can remove duplicates before doing the groupby.

For the second table, you can use a Window recipe. Select โ€œmarketโ€ and โ€œdataโ€ as the partitioning columns (the fields you will use to group the dataset), and โ€œsumโ€ sales in Aggregations. Finally, remove duplicates in Post-filter. Window recipes are fantastic and are quite useful for many things. This page has more info if you're interested: https://academy.dataiku.com/latest/tutorial/visual-recipes/window.html

Hope that helps!

Jacqueline

emate
Level 5
Author

Hi @JacquelineK 

Thanks a lot for your input, I will definitely try this.  Your outputs looks almost as they should, only there is still duplicated sum of sales in both of them, for example,  MRCC I in both tables should be 180. But I hope with some fuctions you mentioned I will be able to handle this.

JacquelineK
Dataiker

Hi @emate ,

Ah, yes there are duplicate Market--Sales pairs in the dataset. You may have found a way to fix it already, but here are some thoughts:

In this case, removing duplicates in Pre-filter in the Group Recipe won't work because it looks for duplicates across entire rows. While Market and Sales is the same in multiple rows, the Group field makes these rows distinct.

You could drop the Group field in a prepare recipe first. Then, remove duplicates using Pre-filter in the Group Recipe to get 180 for MRCC I. Alternatively, you can also remove duplicates in the Prepare Recipe. Using Filter in Pre-filter might also work for you depending on your use case.

emate
Level 5
Author

Hi @JacquelineK 

I'm almost done with the first table, first I removed duplicates (based on product, sales, , data, market), then I am using your group recipe and it giving me correct result for sales per market, but my 'date' column is missing, and im not sure why... is there a way to keep it? I will need it later to join this table with my orginal data. 

 

**** Ok, nevermind, I got confused, I should group it by market, sales and DATE... sorry for the confusion, i'm in the process of transition between tools I used to for last 6 years.. ๐Ÿ™‚

JacquelineK
Dataiker

@emate Glad that worked for you!