Remove Duplicates - Keep first/last

bhobbs · October 2022

Is anyone familiar with a Dataiku visual recipe or formula that will replicate the following python code?

df.drop_duplicates(subset=['col1'], keep='first', inplace=True)

The distinct recipe does not quite accomplish this as it either removes duplicates based on all columns or only return the subset you selected.

Miguel Angel · October 2022

Hi,

A group recipe could be used to do this task.
Make 'col1' the group key and select for the other columns to keep the first value. For example:

Best Answer