Survey banner
The Dataiku Community is moving to a new home! We are temporary in read only mode: LEARN MORE

Duplicate removal

Level 1
Duplicate removal

how can i remove duplicate row based on single column without python code

0 Kudos
2 Replies


You can use a group recipe on this column. You will have only unique rows for this column. But then you need to decide what you want to do with the cells from the other columns.

DSS provides lot of choices :

Screenshot 2024-05-29 at 11.23.28.png

To explain some of them :

  1. Concat will just concatenate all values from other rows, you can specify the separator
  2. Avg: For numerical types such as integer, you can compute the average
  3. Distinct will just compute the number of distinct values found on this column
  4. For the rest, have a look at the documentation



0 Kudos

@Karthikeyanvenk ,

Welcome to the Dataiku Community.  We are so glad to have you join us.

There are a number of ways to remove duplicates.

Some are described in this thread.

When it comes to reliably removing duplicates and I in the case where I know how to order the duplicate records to keep the ones I want and remove the rest, I tend to use the Window Recipe.

I tend to use the method described in this community post.

I also note that there is a distinct visual recipe.  I think that this was added to Dataiku DSS after I learned the window trick.

Hope one of these ways helps.  Let us know how you are getting along with the project you are working on.

0 Kudos