Remove Duplicates - Keep first/last

bhobbs
bhobbs Registered Posts: 1 ✭✭✭

Is anyone familiar with a Dataiku visual recipe or formula that will replicate the following python code?

df.drop_duplicates(subset=['col1'], keep='first', inplace=True)

The distinct recipe does not quite accomplish this as it either removes duplicates based on all columns or only return the subset you selected.

Best Answer

  • Miguel Angel
    Miguel Angel Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 118 Dataiker
    Answer ✓

    Hi,

    A group recipe could be used to do this task.
    Make 'col1' the group key and select for the other columns to keep the first value. For example:

    examplededup.PNG

Setup Info
    Tags
      Help me…