Ready for Dataiku 10? Try out the Crash Course on new features!GET STARTED

Extract the value based on the conditions on rows

Tsurapaneni
Level 3
Level 3
Extract the value based on the conditions on rows

Hi Team, 

I have a use case, where I want to populate a column with a value  extracted from a condition on a row. Provided the example below for better comprehending. 

Consider I have the below dataset 

A.               B.               C

row 1         red               2

row 2         blue             4

row 3.        purple          6

 

Now I want to create a column named "D" within the same dataset which should have the entire column filled with the value from the condition when (Column A == row2 and Column B == 'blue' then extract C)

The value from the above column (if (and (A == 'row2', B == 'blue'), C, ) is 4 so the column D should be filled with the value 4. While I write the above condition I have to provide the else case to replace the rest of the columns but I want to fill the entire column based only on 1 condition. 

NOTE : I want to achieve this only using the visual recipes but not the coding recipe. 

The output dataset should look like :

A.               B.               C           D

row 1         red               2           4

row 2         blue             4           4

row 3.        purple          6           4

 

Hoping to hearing from you soon !

Thanks !

 

0 Kudos
3 Replies
adamnieto
Neuron
Neuron

Hi @Tsurapaneni , 

 

I believe you are on the right track. What I would do is use a formula step in your prep recipe with the following formula:

if(row_number == "row_4" && color == "blue",value_column,"")

 

Then afterwards using the following step in the prep recipe 

 

"impute missing value of column_d with MODE". 

Since the column "D" that would only have the value you are looking for you can just impute with the mode of that column. 

 

Hope this helps!

0 Kudos
Tsurapaneni
Level 3
Level 3
Author

Hi @adamnieto,

Thank you for your response ! It was helpful. I have encountered one more issue during this process which is I would like to have the empty values along with the decimal values in the same column with the datatype decimal.

It is throwing an error as cast or rewrite expression is needed. this is because there is empty value in the column with double datatype.

Do you have any solution to this issue. 

eg:

A       B      C

1.      2.2        1.1

2       1.0       BLANK ('')

3        blank('')   2.4

In the second row of column "C" and 3rd row of column "B" doesn't have any value and replaced the value with a null value and then the process is throwing an error regarding the cast the data type. As the blank value is considered as text or string and it couldn't be merged with the double data type.

Can you please suggest with a possible solution for this issue.

Thanks !

0 Kudos

@Tsurapaneni , apologies for the late response. By any chance do you have the ability to share the step in your recipe that is giving you the error? 

0 Kudos
A banner prompting to get Dataiku DSS