Remove Duplicates - Keep first/last
Registered Posts: 1 ✭✭✭
Is anyone familiar with a Dataiku visual recipe or formula that will replicate the following python code?
df.drop_duplicates(subset=['col1'], keep='first', inplace=True)
The distinct recipe does not quite accomplish this as it either removes duplicates based on all columns or only return the subset you selected.
Best Answer
Miguel Angel Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 118 Dataiker
A group recipe could be used to do this task.
Make 'col1' the group key and select for the other columns to keep the first value. For example: