Announcing the winners & finalists of the Dataiku Frontrunner Awards 2021! Read their inspiring stories

Removing duplicate columns

frotograf
Level 2
Removing duplicate columns

Dear community,

I have such a case:

- I have a large database that needs cleaning.

- while performing the typical cleaning activities (parsing etc.) I discovered that I have numerous columns that are just duplicates of one another (judging by basic analysis it's hundreds) but with different names. 

Example: 1 column name is "things_bought_on_2021_03_07", it's duplicates have names like "things_bought_on_2021_03_07_01" and "things_bought_on_2021_03_07_02".

I know none of the ways to deal with this in Dataiku. Working on duplicate rows would be easier 😉 (I do not have duplicate rows on this one..)

Thank you! 

7 Replies
CoreyS
Community Manager
Community Manager

@frotograf welcome to the community! You can achieve this through a Prepare Recipe. Since you are new to DSS, I would recommend utilizing the Dataiku Academy Core Designer Learning Path. The 102 course in the path goes over Prepare Recipes.

I hope this helps!

 

Looking for more resources to help you use Dataiku effectively and upskill your knowledge? Check out these great resources: Dataiku Academy | Documentation | Knowledge Base

A reply answered your question? Mark as ‘Accepted Solution’ to help others like you!
frotograf
Level 2
Author

 I did not think of receipt preparation. Thanks for the tip. I've been using various resources from Dataiku and I totally love it. As a non-coder (but understanding python basics) this tool has been a godsend for me! 

CoreyS
Community Manager
Community Manager

Thank you for the feedback! Yes i am very much a clicker myself and the Prepare Recipe is a really powerful tool in DSS.

Looking for more resources to help you use Dataiku effectively and upskill your knowledge? Check out these great resources: Dataiku Academy | Documentation | Knowledge Base

A reply answered your question? Mark as ‘Accepted Solution’ to help others like you!
frotograf
Level 2
Author

I'll get back if I won't find what I'm looking for there, so no worries. Maybe sb will have the same problem! 🙂 

0 Kudos
frotograf
Level 2
Author

Thanks for the link @CoreyS. I've got through the academy's 101 & part of 102 but I still have to manually click on the data set. The more advanced options are for the python users and I cannot think the answer to my Q while being a no-code user. 

Still the tool is fantastic and I got to remember some of the functions of the DDS - I see some new additions to the software since I started using it this year! 

0 Kudos
emate
Neuron
Neuron

Hi @frotograf 

Could you please show me a 'fake' dataset with 3-4 cases where I can see what duplicates looks like and explain in more datails which values you want to remove? It can be sreenshot or some small portion of the data in the xlsx.

 

0 Kudos
frotograf
Level 2
Author

@emate thanks for the post! 

It looks like that in the screenshot. The columns are the same, the rows have the same values, the only difference with them is the number on the end of the column (the ones I marked in circles).

0 Kudos