Creating a (categorical) column based on another column values?

Gerry Leo · May 2020

Hi,

I'm trying to create a column based on another column values. I'm aware I could easily achieve this by the builtin processor through the following steps:

- copy/duplicate the existing column,

- and do find-and-replace on the new created column according to the previous column values.

But I wish to do this with the formula processor instead, with the "if-statement" approach. Perhaps I may illustrate them like the following :

existing_column	new_column
BMW	1
Toyota	2
Honda	3

I'd like to categorize the manufacturer brand on left column to values on the right column. So far, I've been doing it the new_column formula creation like the following:

if (existing_column == "BMW", "1", 
else
    existing_column == "Toyota", "2",
else
   existing_column == "Honda", "3",
end)

But the code didn't work, could anyone point me to a somewhat similar examples? Been looking for it for awhile, and any help would be greatly appreciated.

Thanks,

GLN

dimitri · May 2020

Hi @gerryleonugroho
,

In your case, you need to use nested if-statements.

Here would be the correct usage for your example.

if(existing_column == "BMW", 1, 
if(existing_column == "Toyota", 2, 
if(existing_column == "Honda", 3,"")))

Have a nice day!

Gerry Leo · May 2020

Hi @dimitri
, thanks for the help, the solution worked out great! Though I'am just wondering, would there be a more "elegant" way to solve this? Assuming I have a little over then a hundred list of car manufacturers to match, since that would create a lot of nested closing brackets at the end.

Thanks,

GLN

dimitri · May 2020

Hi @gerryleonugroho
,

An alternative way to achieve that in a more convenient manner when you have a lot of values would be to use a join recipe with a "reference table".

join recipe.png

The "reference table" could be an editable dataset that you create in DSS, with two columns, one containing all the values of the original column, and the second containing all the corresponding values.

editable dataset.png

Then create a join recipe (using either a left or a right join depending on the order of the original dataset), and you will be able to generate a new dataset having a new column containing the matching values.

recipe details.png

Have a nice day!

Gerry Leo · May 2020

Hi @dimitri
,

This is getting interesting, especially with the second step of having a reference table, where you can directly create in DSS, without having to upload them in the first place and have them edited/updated according to your requirements.

I didn't even know the feature exist, do you mind showing the steps to create one? I haven't been able to find them in the documentation/academy sections of the Dataiku's site. While for the join recipe, it's something I had in mind previously, it just that I took a different route to accomplish the journey.

Cheers,

GLN

dimitri · May 2020

Hi @gerryleonugroho
,

You can create an editable dataset right from the flow by clicking on "+ dataset" then "editable".

new editable dataset.png

Select whether you want to start from scratch, or whether you want to prepopulate it with existing values from a file or another dataset, set its name and then click on "Create".

create editable dataset.png

Once created, when you open this dataset there is a new tab "Edit", from which you can edit the columns and the values. (Right-click on an existing column to insert a new one.)

edit dataset.png

Have a nice day!

Gerry Leo · May 2020

Hi @dimitri
,

This is looking good! Thanks for all the explanation, really appreciate all the efforts. You too, have a nice day, and please be safe during the pandemic.

Cheers,

GLN

Kkmakv · June 2022

Here's the challenge am facing now.

I have 2 columns and I attached the image.I want to change or create column 2 based on Column 1.

Is it possible to do by using Fomula in prepare recipe.

Kkmakv · June 2022

I have 2 columns and I attached the details as well. Let me know how to create the output column 2 using column 1 and 2 using formula or any other way

Creating a (categorical) column based on another column values?

Tags

Welcome!

Best Answer

Answers

Welcome!

Welcome!

Quick Links

Categories

Sign up to take part

Creating a (categorical) column based on another column values?

Tags

Welcome!

Best Answer

Answers

Welcome!

Welcome!

Quick Links

Categories