Remove Duplicates - Keep first/last
bhobbs
Registered Posts: 1 ✭✭✭
Is anyone familiar with a Dataiku visual recipe or formula that will replicate the following python code?
df.drop_duplicates(subset=['col1'], keep='first', inplace=True)
The distinct recipe does not quite accomplish this as it either removes duplicates based on all columns or only return the subset you selected.
Best Answer
-
Miguel Angel Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 118 Dataiker
Hi,
A group recipe could be used to do this task.
Make 'col1' the group key and select for the other columns to keep the first value. For example: