Creating a (categorical) column based on another column values?

Options
Gerry Leo
Gerry Leo Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS Core Concepts, Registered Posts: 45 ✭✭✭✭✭

Hi,

I'm trying to create a column based on another column values. I'm aware I could easily achieve this by the builtin processor through the following steps:

- copy/duplicate the existing column,

- and do find-and-replace on the new created column according to the previous column values.

But I wish to do this with the formula processor instead, with the "if-statement" approach. Perhaps I may illustrate them like the following :

existing_columnnew_column
BMW1
Toyota2
Honda3

I'd like to categorize the manufacturer brand on left column to values on the right column. So far, I've been doing it the new_column formula creation like the following:

if (existing_column == "BMW", "1", else    existing_column == "Toyota", "2",else   existing_column == "Honda", "3",end)

But the code didn't work, could anyone point me to a somewhat similar examples? Been looking for it for awhile, and any help would be greatly appreciated.

Thanks,

GLN

Best Answer

  • dimitri
    dimitri Dataiker, Product Ideas Manager Posts: 33 Dataiker
    Answer ✓
    Options

    Hi @gerryleonugroho
    ,

    In your case, you need to use nested if-statements.

    Here would be the correct usage for your example.

    if(existing_column == "BMW", 1, 
    if(existing_column == "Toyota", 2,
    if(existing_column == "Honda", 3,"")))

    Have a nice day!

Answers

  • Gerry Leo
    Gerry Leo Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS Core Concepts, Registered Posts: 45 ✭✭✭✭✭
    Options

    Hi @dimitri
    , thanks for the help, the solution worked out great! Though I'am just wondering, would there be a more "elegant" way to solve this? Assuming I have a little over then a hundred list of car manufacturers to match, since that would create a lot of nested closing brackets at the end.

    Thanks,

    GLN

  • dimitri
    dimitri Dataiker, Product Ideas Manager Posts: 33 Dataiker
    Options

    Hi @gerryleonugroho
    ,

    An alternative way to achieve that in a more convenient manner when you have a lot of values would be to use a join recipe with a "reference table".

    join recipe.png

    The "reference table" could be an editable dataset that you create in DSS, with two columns, one containing all the values of the original column, and the second containing all the corresponding values.

    editable dataset.png

    Then create a join recipe (using either a left or a right join depending on the order of the original dataset), and you will be able to generate a new dataset having a new column containing the matching values.

    recipe details.png

    result.png

    Have a nice day!

  • Gerry Leo
    Gerry Leo Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS Core Concepts, Registered Posts: 45 ✭✭✭✭✭
    Options

    Hi @dimitri
    ,

    This is getting interesting, especially with the second step of having a reference table, where you can directly create in DSS, without having to upload them in the first place and have them edited/updated according to your requirements.

    I didn't even know the feature exist, do you mind showing the steps to create one? I haven't been able to find them in the documentation/academy sections of the Dataiku's site. While for the join recipe, it's something I had in mind previously, it just that I took a different route to accomplish the journey.

    Cheers,

    GLN

  • dimitri
    dimitri Dataiker, Product Ideas Manager Posts: 33 Dataiker
    Options

    Hi @gerryleonugroho
    ,

    You can create an editable dataset right from the flow by clicking on "+ dataset" then "editable".

    new editable dataset.png

    Select whether you want to start from scratch, or whether you want to prepopulate it with existing values from a file or another dataset, set its name and then click on "Create".

    create editable dataset.png

    Once created, when you open this dataset there is a new tab "Edit", from which you can edit the columns and the values. (Right-click on an existing column to insert a new one.)

    edit dataset.png

    Have a nice day!

  • Gerry Leo
    Gerry Leo Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS Core Concepts, Registered Posts: 45 ✭✭✭✭✭
    Options

    Hi @dimitri
    ,

    This is looking good! Thanks for all the explanation, really appreciate all the efforts. You too, have a nice day, and please be safe during the pandemic.

    Cheers,

    GLN

  • Kkmakv
    Kkmakv Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 3 ✭✭✭
    Options

    Here's the challenge am facing now.

    I have 2 columns and I attached the image.I want to change or create column 2 based on Column 1.

    Is it possible to do by using Fomula in prepare recipe.

  • Kkmakv
    Kkmakv Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 3 ✭✭✭
    Options

    I have 2 columns and I attached the details as well. Let me know how to create the output column 2 using column 1 and 2 using formula or any other way

Setup Info
    Tags
      Help me…