Community Conundrum 25:Feature Visualization is now live! Read More

Remove Duplicates based on one column

Level 1
Remove Duplicates based on one column

Hi All

Here is my requirement.

I wanna remove duplicate rows based on one column. Is there any way to do in DSS recipes. 

Please advise 

Thanks in advance ๐Ÿ™‚

0 Kudos
6 Replies
Dataiker
Dataiker

@Renga3037 in the "Distinct" visual recipe you can choose either to remove duplicates based on all columns or choose a subset including one column. If you choose to use one column it will return only that one column and just the distinct values. If you need to create some logic (like the first value based on some sort of Sort) then you should look at the Window recipe which allows you to choose First, Last, Lag, etc.

 

 

 

0 Kudos
Level 1
Author

@GCase  Agreed, but I want all the column as output not only distinct column !

0 Kudos
Dataiker
Dataiker

Can you be more specific

For example, assume ID is the column you want to have uniques

ID First_Name Last_Name Year_Entered
1 Lebron James 2004
2 Michael Jordan 1985
2 Larry Bird 1980

What would the dataset you returned look like based on that list?

@Renga3037 

0 Kudos
Level 1
Author

@GCase

Correct, I want unique ID (removing duplicates)

Result would be like this 

1 Lebron James 2004

2 Michal Jordan 1985

 

Hope you get that 

 

0 Kudos
Dataiker
Dataiker

Why Michael Jordan? Was it because that was the first row or some other reason?

Grant

@Renga3037 

0 Kudos
Level 1

I think it's because it keeps the first distinct ID that he sees. @GCase 

Unfortunately, the distinct recipe as it is in DSS won't allow this.

Two solutions in my mind:

Convert this request in SQL , something like:

SELECT DISTINCT ON (your_column) your_table.*

FROM your_table

ORDER BY your_column; 

OR

Add a windows recipes before and compute a column with a rank, which will be used in the distinct recipe as pre-filter.

@Renga3037 

0 Kudos