How to fill the column with the value in a row based on a condition

Tsurapaneni
Level 3
How to fill the column with the value in a row based on a condition

Hi Team,

I have a use case where I have to replace a value based on the row value on another condition. for an instance I have 3 columns (ABC) and have to generate a new column (D)

A.     B.      C.                         Output ->.   A.     B.         C.       D     

2      red.    red                                          2.      red.      red.     2

3.     red.    blue                                         3      red        blue    2

4     green.  pink                                        4.     green     pink.   2

In the above example I have created a new column which is D based on the condition that if ((B==C), A, 000) which gives the value 0 for the rest of the rows which aren't a match but I want the same value throughout the column without explicitly defining the value (eg: using the fill column processor and using the 2 to the value). this might serve now but not in future. 

 

Thanks in advance !

 

0 Kudos
4 Replies
AgatheG
Dataiker

Hi Tsurapaneni,

I am not sure to understand your use case. From what I gathered in your post, you mean to:

1. Find the value in column A such that B == C (assuming either one and only one row matches this condition, or that A always takes the same value for rows matching the condition)

2. Create a new column D filled with the value computed in step 1

If this is indeed your intent to do, you can do so with a short Python recipe, using these lines:

dataset = dataiku.Dataset(YOUR_DATASET_NAME)
df = dataset.get_dataframe()

df["D"] = df.A[df.B == df.C][0]

 

Hope this helps!

 

Agathe

0 Kudos
Tsurapaneni
Level 3
Author

expecting the solution in the prep recipe through formulas but not in the python code recipes. I am good with python but I am looking only through formulas in the prep visual recipe. 

0 Kudos
mluu
Level 1

@Tsurapaneni @AgatheG 

 

Hi - were you able to find an answer to this problem? I am interested in a solution that only uses a visual recipe as well.

0 Kudos
emate
Level 5

@mluu I'm not sure if I understand the issue here as well, but why, if you dont want to use python code above can't just use prepare recipe -> formula and use the syntax mentioned in first post so: if ((Col_B==Col_C), Col_A, 0)?

*** Ok, nevermind, I understand it now...:) 

If this is exactly the same case as in post nr 1, and you have only one row where this condition B=C is met (or multiple rows, but the rule is always the same, so in this case, its red= red - because otherewise I dont know how that would work), my first thought is to try create a new prepare recipe, remove all rows where this condition is not met and remove all duplicates if necessary so you will end up with one row:

A       B           C        D

 2.      red.      red.     2

and then join column D from this output again with orginal dataset with "cross join" ? - I didn't test it, but that's my idea for a work around.

Thanks,

Mateusz

0 Kudos