Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Added on May 13, 2020 3:07PM
Likes: 1
Replies: 8
Hi,
I'm trying to create a column based on another column values. I'm aware I could easily achieve this by the builtin processor through the following steps:
- copy/duplicate the existing column,
- and do find-and-replace on the new created column according to the previous column values.
But I wish to do this with the formula processor instead, with the "if-statement" approach. Perhaps I may illustrate them like the following :
existing_column | new_column |
BMW | 1 |
Toyota | 2 |
Honda | 3 |
I'd like to categorize the manufacturer brand on left column to values on the right column. So far, I've been doing it the new_column formula creation like the following:
if (existing_column == "BMW", "1", else existing_column == "Toyota", "2", else existing_column == "Honda", "3", end)
But the code didn't work, could anyone point me to a somewhat similar examples? Been looking for it for awhile, and any help would be greatly appreciated.
Thanks,
GLN
Hi @gerryleonugroho
,
In your case, you need to use nested if-statements.
Here would be the correct usage for your example.
if(existing_column == "BMW", 1,
if(existing_column == "Toyota", 2,
if(existing_column == "Honda", 3,"")))
Have a nice day!
Hi @dimitri
, thanks for the help, the solution worked out great! Though I'am just wondering, would there be a more "elegant" way to solve this? Assuming I have a little over then a hundred list of car manufacturers to match, since that would create a lot of nested closing brackets at the end.
Thanks,
GLN
Hi @gerryleonugroho
,
An alternative way to achieve that in a more convenient manner when you have a lot of values would be to use a join recipe with a "reference table".
The "reference table" could be an editable dataset that you create in DSS, with two columns, one containing all the values of the original column, and the second containing all the corresponding values.
Then create a join recipe (using either a left or a right join depending on the order of the original dataset), and you will be able to generate a new dataset having a new column containing the matching values.
Have a nice day!
Hi @dimitri
,
This is getting interesting, especially with the second step of having a reference table, where you can directly create in DSS, without having to upload them in the first place and have them edited/updated according to your requirements.
I didn't even know the feature exist, do you mind showing the steps to create one? I haven't been able to find them in the documentation/academy sections of the Dataiku's site. While for the join recipe, it's something I had in mind previously, it just that I took a different route to accomplish the journey.
Cheers,
GLN
Hi @gerryleonugroho
,
You can create an editable dataset right from the flow by clicking on "+ dataset" then "editable".
Select whether you want to start from scratch, or whether you want to prepopulate it with existing values from a file or another dataset, set its name and then click on "Create".
Once created, when you open this dataset there is a new tab "Edit", from which you can edit the columns and the values. (Right-click on an existing column to insert a new one.)
Have a nice day!
Here's the challenge am facing now.
I have 2 columns and I attached the image.I want to change or create column 2 based on Column 1.
Is it possible to do by using Fomula in prepare recipe.
I have 2 columns and I attached the details as well. Let me know how to create the output column 2 using column 1 and 2 using formula or any other way