Removing duplicate columns
Dear community,
I have such a case:
- I have a large database that needs cleaning.
- while performing the typical cleaning activities (parsing etc.) I discovered that I have numerous columns that are just duplicates of one another (judging by basic analysis it's hundreds) but with different names.
Example: 1 column name is "things_bought_on_2021_03_07", it's duplicates have names like "things_bought_on_2021_03_07_01" and "things_bought_on_2021_03_07_02".
I know none of the ways to deal with this in Dataiku. Working on duplicate rows would be easier
Thank you!
Answers
-
CoreyS Dataiker Alumni, Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Registered Posts: 1,150 ✭✭✭✭✭✭✭✭✭
@frotograf
welcome to the community! You can achieve this through a Prepare Recipe. Since you are new to DSS, I would recommend utilizing the Dataiku Academy Core Designer Learning Path. The 102 course in the path goes over Prepare Recipes.I hope this helps!
-
I did not think of receipt preparation. Thanks for the tip. I've been using various resources from Dataiku and I totally love it. As a non-coder (but understanding python basics) this tool has been a godsend for me!
-
CoreyS Dataiker Alumni, Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Registered Posts: 1,150 ✭✭✭✭✭✭✭✭✭
Thank you for the feedback! Yes i am very much a clicker myself and the Prepare Recipe is a really powerful tool in DSS.
-
I'll get back if I won't find what I'm looking for there, so no worries. Maybe sb will have the same problem!
-
Thanks for the link @CoreyS
. I've got through the academy's 101 & part of 102 but I still have to manually click on the data set. The more advanced options are for the python users and I cannot think the answer to my Q while being a no-code user.Still the tool is fantastic and I got to remember some of the functions of the DDS - I see some new additions to the software since I started using it this year!
-
Mateusz Dataiku DSS Core Designer, Neuron 2020, Registered, Neuron 2021, Neuron 2022 Posts: 91 ✭✭✭✭✭✭
Hi @frotograf
Could you please show me a 'fake' dataset with 3-4 cases where I can see what duplicates looks like and explain in more datails which values you want to remove? It can be sreenshot or some small portion of the data in the xlsx.
-
@emate
thanks for the post!It looks like that in the screenshot. The columns are the same, the rows have the same values, the only difference with them is the number on the end of the column (the ones I marked in circles).
-
Did you ever figure this out?
-
CoreyS Dataiker Alumni, Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Registered Posts: 1,150 ✭✭✭✭✭✭✭✭✭
Hi @rachelli24
and welcome to the Dataiku Community. Have you tried using the Delete/Keep columns by name processor in the Prepare Recipe?Here are some other resources you may find helpful with the Prepare Recipe:
- Data Preparation (Knowledge Base)
I hope this helps!