Count rows within group

Solved!
emate
Level 5
Count rows within group

Hi Team,

I have two tables:

Main table: with orginal fileds date, international_product, group, country, sales_cd

image.png

and the table that I want to join with my main table - i am using left join, and setting connection between date, international_product,country

image.png

As you can see in orginal table, each product can be in more than 1 group, this is the reason when I am joining column sales_lcd_mkt I'm ending up with duplicates in my sales_lcd_mkt column. My first idea is to use somehow window recipe, and count rows within country/product/date/group window to get my rows counted as like in my "output_idea", next step would be to clear cells in sales_lcd_mkt where value is NOT equal 1, that way I would remove duplicates. But I am failing to get rows counted with unique number - could anyone let me know if that is possible? Or maybe I can avoid these duplicates in my join recipe and at the same time keep orginal table as it is?

Maybe it is very easy to do, but I am stuck, and I would really appreciate any help

 

Thanks,

Mateusz

0 Kudos
1 Solution
fchataigner2
Dataiker

Hi,

if you do a window partitioned by country/product/date, ordered by group, with a frame without limits (ie active frame but no limit on preceding or following rows), then you can use a 'count' aggregate to get the number of rows in each country/product/date batch, and a dense rank (checkbox over the window aggregates) to retrieve the order of rows within the batch

View solution in original post

2 Replies
fchataigner2
Dataiker

Hi,

if you do a window partitioned by country/product/date, ordered by group, with a frame without limits (ie active frame but no limit on preceding or following rows), then you can use a 'count' aggregate to get the number of rows in each country/product/date batch, and a dense rank (checkbox over the window aggregates) to retrieve the order of rows within the batch

emate
Level 5
Author

Thanks! this solution works fine! Now I can always relate to column 'denserank' =! 1 - and based on that clear duplicated cells!

0 Kudos