Extract the value based on the conditions on rows

Options
Tsurapaneni
Tsurapaneni Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Registered Posts: 41 ✭✭✭✭

Hi Team,

I have a use case, where I want to populate a column with a value extracted from a condition on a row. Provided the example below for better comprehending.

Consider I have the below dataset

A. B. C

row 1 red 2

row 2 blue 4

row 3. purple 6

Now I want to create a column named "D" within the same dataset which should have the entire column filled with the value from the condition when (Column A == row2 and Column B == 'blue' then extract C)

The value from the above column (if (and (A == 'row2', B == 'blue'), C, ) is 4 so the column D should be filled with the value 4. While I write the above condition I have to provide the else case to replace the rest of the columns but I want to fill the entire column based only on 1 condition.

NOTE : I want to achieve this only using the visual recipes but not the coding recipe.

The output dataset should look like :

A. B. C D

row 1 red 2 4

row 2 blue 4 4

row 3. purple 6 4

Hoping to hearing from you soon !

Thanks !

Answers

  • adamnieto
    adamnieto Neuron 2020, Neuron, Registered, Neuron 2021, Neuron 2022, Neuron 2023 Posts: 87 Neuron
    edited July 17
    Options

    Hi @Tsurapaneni
    ,

    I believe you are on the right track. What I would do is use a formula step in your prep recipe with the following formula:

    if(row_number == "row_4" && color == "blue",value_column,"")

    Then afterwards using the following step in the prep recipe

    "impute missing value of column_d with MODE".

    Since the column "D" that would only have the value you are looking for you can just impute with the mode of that column.

    Hope this helps!

  • Tsurapaneni
    Tsurapaneni Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Registered Posts: 41 ✭✭✭✭
    Options

    Hi @adamnieto
    ,

    Thank you for your response ! It was helpful. I have encountered one more issue during this process which is I would like to have the empty values along with the decimal values in the same column with the datatype decimal.

    It is throwing an error as cast or rewrite expression is needed. this is because there is empty value in the column with double datatype.

    Do you have any solution to this issue.

    eg:

    A B C

    1. 2.2 1.1

    2 1.0 BLANK ('')

    3 blank('') 2.4

    In the second row of column "C" and 3rd row of column "B" doesn't have any value and replaced the value with a null value and then the process is throwing an error regarding the cast the data type. As the blank value is considered as text or string and it couldn't be merged with the double data type.

    Can you please suggest with a possible solution for this issue.

    Thanks !

  • adamnieto
    adamnieto Neuron 2020, Neuron, Registered, Neuron 2021, Neuron 2022, Neuron 2023 Posts: 87 Neuron
    Options

    @Tsurapaneni
    , apologies for the late response. By any chance do you have the ability to share the step in your recipe that is giving you the error?

Setup Info
    Tags
      Help me…