The Dataiku Frontrunner Awards have just launched to recognize your achievements! Submit Your Entry

Counting Similar Rows

davidkubica1
Level 1
Counting Similar Rows

I have a dataset. Four of the columns are Location Number, Employee ID, Date, and Symbol. I would like to create a 5th column that contains counts of how many times a row with matching data in all four columns appears.

For example, there are 5 rows with Location Number=308, Employee ID=37, Date=03/23/2007 and Symbol=TP. The 5th column therefore is "5".

Any suggestions on how I could go about doing this?

0 Kudos
1 Reply
fchataigner2
Dataiker
Dataiker

you can make a window recipe, partitioned on these 4 columns, with some column as ordering column (doesn't matter which), and a window frame activated with no limit on preceding nor following rows. And in the aggregates section, activate count on some column with no empty value (employee id for example). This will add on each row the count of rows with the same combination of the 4 columns used for partitioning.  

A banner prompting to get Dataiku DSS