Duplicate removal
how can i remove duplicate row based on single column without python code
Answers
-
louisbarjon Dataiker, Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 9 Dataiker
Hello,
You can use a group recipe on this column. You will have only unique rows for this column. But then you need to decide what you want to do with the cells from the other columns.
DSS provides lot of choices :
To explain some of them :
- Concat will just concatenate all values from other rows, you can specify the separator
- Avg: For numerical types such as integer, you can compute the average
- Distinct will just compute the number of distinct values found on this column
- For the rest, have a look at the documentation
-
tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,598 Neuron
Welcome to the Dataiku Community. We are so glad to have you join us.
There are a number of ways to remove duplicates.
Some are described in this thread.
https://community.dataiku.com/t5/Using-Dataiku/How-to-identify-duplicates-in-a-data-set/m-p/25831
When it comes to reliably removing duplicates and I in the case where I know how to order the duplicate records to keep the ones I want and remove the rest, I tend to use the Window Recipe.
I tend to use the method described in this community post.
https://community.dataiku.com/t5/Using-Dataiku/Is-there-a-way-to-conditionally-delete-duplicates-based-on-some/m-p/592
I also note that there is a distinct visual recipe. I think that this was added to Dataiku DSS after I learned the window trick.
https://knowledge.dataiku.com/latest/data-preparation/visual-recipes/tutorial-distinct-recipe.htmlHope one of these ways helps. Let us know how you are getting along with the project you are working on.