Discover all of the brand-new features and improvements to existing capabilities in the Dataiku 11.3 updateLET'S GO

Remove Duplicates - Keep first/last

Solved!
bhobbs
Level 1
Remove Duplicates - Keep first/last

Is anyone familiar with a Dataiku visual recipe or formula that will replicate the following python code?

df.drop_duplicates(subset=['col1'], keep='first', inplace=True)

The distinct recipe does not quite accomplish this as it either removes duplicates based on all columns or only return the subset you selected.

0 Kudos
1 Solution
MiguelangelC
Dataiker

Hi,

A group recipe could be used to do this task.
Make 'col1' the group key and select for the other columns to keep the first value. For example:

 

examplededup.PNG

View solution in original post

1 Reply
MiguelangelC
Dataiker

Hi,

A group recipe could be used to do this task.
Make 'col1' the group key and select for the other columns to keep the first value. For example:

 

examplededup.PNG