Creating a (categorical) column based on another column values?
Hi,
I'm trying to create a column based on another column values. I'm aware I could easily achieve this by the builtin processor through the following steps:
- copy/duplicate the existing column,
- and do find-and-replace on the new created column according to the previous column values.
But I wish to do this with the formula processor instead, with the "if-statement" approach. Perhaps I may illustrate them like the following :
existing_column | new_column |
BMW | 1 |
Toyota | 2 |
Honda | 3 |
I'd like to categorize the manufacturer brand on left column to values on the right column. So far, I've been doing it the new_column formula creation like the following:
if (existing_column == "BMW", "1", else existing_column == "Toyota", "2", else existing_column == "Honda", "3", end)
But the code didn't work, could anyone point me to a somewhat similar examples? Been looking for it for awhile, and any help would be greatly appreciated.
Thanks,
GLN
Best Answer
-
Hi @gerryleonugroho
,
In your case, you need to use nested if-statements.Here would be the correct usage for your example.
if(existing_column == "BMW", 1,
if(existing_column == "Toyota", 2,
if(existing_column == "Honda", 3,"")))Have a nice day!
Answers
-
Gerry Leo Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS Core Concepts, Registered Posts: 45 ✭✭✭✭✭
Hi @dimitri
, thanks for the help, the solution worked out great! Though I'am just wondering, would there be a more "elegant" way to solve this? Assuming I have a little over then a hundred list of car manufacturers to match, since that would create a lot of nested closing brackets at the end.Thanks,
GLN
-
Hi @gerryleonugroho
,An alternative way to achieve that in a more convenient manner when you have a lot of values would be to use a join recipe with a "reference table".
The "reference table" could be an editable dataset that you create in DSS, with two columns, one containing all the values of the original column, and the second containing all the corresponding values.
Then create a join recipe (using either a left or a right join depending on the order of the original dataset), and you will be able to generate a new dataset having a new column containing the matching values.
Have a nice day!
-
Gerry Leo Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS Core Concepts, Registered Posts: 45 ✭✭✭✭✭
Hi @dimitri
,This is getting interesting, especially with the second step of having a reference table, where you can directly create in DSS, without having to upload them in the first place and have them edited/updated according to your requirements.
I didn't even know the feature exist, do you mind showing the steps to create one? I haven't been able to find them in the documentation/academy sections of the Dataiku's site. While for the join recipe, it's something I had in mind previously, it just that I took a different route to accomplish the journey.
Cheers,
GLN
-
Hi @gerryleonugroho
,You can create an editable dataset right from the flow by clicking on "+ dataset" then "editable".
Select whether you want to start from scratch, or whether you want to prepopulate it with existing values from a file or another dataset, set its name and then click on "Create".
Once created, when you open this dataset there is a new tab "Edit", from which you can edit the columns and the values. (Right-click on an existing column to insert a new one.)
Have a nice day!
-
Kkmakv Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 3 ✭✭✭
Here's the challenge am facing now.
I have 2 columns and I attached the image.I want to change or create column 2 based on Column 1.
Is it possible to do by using Fomula in prepare recipe.
-
Kkmakv Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 3 ✭✭✭
I have 2 columns and I attached the details as well. Let me know how to create the output column 2 using column 1 and 2 using formula or any other way