Submit your use case or success story to the 2023 edition of the Dataiku Frontrunner Awards ENTER YOUR SUBMISSION

Count rows within group

Solved!
emate
Count rows within group

Hi Team,

I have two tables:

Main table: with orginal fileds date, international_product, group, country, sales_cd

image.png

and the table that I want to join with my main table - i am using left join, and setting connection between date, international_product,country

image.png

As you can see in orginal table, each product can be in more than 1 group, this is the reason when I am joining column sales_lcd_mkt I'm ending up with duplicates in my sales_lcd_mkt column. My first idea is to use somehow window recipe, and count rows within country/product/date/group window to get my rows counted as like in my "output_idea", next step would be to clear cells in sales_lcd_mkt where value is NOT equal 1, that way I would remove duplicates. But I am failing to get rows counted with unique number - could anyone let me know if that is possible? Or maybe I can avoid these duplicates in my join recipe and at the same time keep orginal table as it is?

Maybe it is very easy to do, but I am stuck, and I would really appreciate any help

 

Thanks,

Mateusz

0 Kudos
1 Solution
fchataigner2
Dataiker

Hi,

if you do a window partitioned by country/product/date, ordered by group, with a frame without limits (ie active frame but no limit on preceding or following rows), then you can use a 'count' aggregate to get the number of rows in each country/product/date batch, and a dense rank (checkbox over the window aggregates) to retrieve the order of rows within the batch

View solution in original post

2 Replies
fchataigner2
Dataiker

Hi,

if you do a window partitioned by country/product/date, ordered by group, with a frame without limits (ie active frame but no limit on preceding or following rows), then you can use a 'count' aggregate to get the number of rows in each country/product/date batch, and a dense rank (checkbox over the window aggregates) to retrieve the order of rows within the batch

emate
Author

Thanks! this solution works fine! Now I can always relate to column 'denserank' =! 1 - and based on that clear duplicated cells!

0 Kudos