Count rows within group

Mateusz
Mateusz Dataiku DSS Core Designer, Neuron 2020, Registered, Neuron 2021, Neuron 2022 Posts: 91 ✭✭✭✭✭✭

Hi Team,

I have two tables:

Main table: with orginal fileds date, international_product, group, country, sales_cd

image.png

and the table that I want to join with my main table - i am using left join, and setting connection between date, international_product,country

image.png

As you can see in orginal table, each product can be in more than 1 group, this is the reason when I am joining column sales_lcd_mkt I'm ending up with duplicates in my sales_lcd_mkt column. My first idea is to use somehow window recipe, and count rows within country/product/date/group window to get my rows counted as like in my "output_idea", next step would be to clear cells in sales_lcd_mkt where value is NOT equal 1, that way I would remove duplicates. But I am failing to get rows counted with unique number - could anyone let me know if that is possible? Or maybe I can avoid these duplicates in my join recipe and at the same time keep orginal table as it is?

Maybe it is very easy to do, but I am stuck, and I would really appreciate any help

Thanks,

Mateusz

Best Answer

  • fchataigner2
    fchataigner2 Dataiker Posts: 355 Dataiker
    Answer ✓

    Hi,

    if you do a window partitioned by country/product/date, ordered by group, with a frame without limits (ie active frame but no limit on preceding or following rows), then you can use a 'count' aggregate to get the number of rows in each country/product/date batch, and a dense rank (checkbox over the window aggregates) to retrieve the order of rows within the batch

Answers

  • Mateusz
    Mateusz Dataiku DSS Core Designer, Neuron 2020, Registered, Neuron 2021, Neuron 2022 Posts: 91 ✭✭✭✭✭✭

    Thanks! this solution works fine! Now I can always relate to column 'denserank' =! 1 - and based on that clear duplicated cells!

Setup Info
    Tags
      Help me…