How to fill the column with the value in a row based on a condition
Hi Team,
I have a use case where I have to replace a value based on the row value on another condition. for an instance I have 3 columns (ABC) and have to generate a new column (D)
A. B. C. Output ->. A. B. C. D
2 red. red 2. red. red. 2
3. red. blue 3 red blue 2
4 green. pink 4. green pink. 2
In the above example I have created a new column which is D based on the condition that if ((B==C), A, 000) which gives the value 0 for the rest of the rows which aren't a match but I want the same value throughout the column without explicitly defining the value (eg: using the fill column processor and using the 2 to the value). this might serve now but not in future.
Thanks in advance !
Answers
-
Hi Tsurapaneni,
I am not sure to understand your use case. From what I gathered in your post, you mean to:
1. Find the value in column A such that B == C (assuming either one and only one row matches this condition, or that A always takes the same value for rows matching the condition)
2. Create a new column D filled with the value computed in step 1
If this is indeed your intent to do, you can do so with a short Python recipe, using these lines:
dataset = dataiku.Dataset(YOUR_DATASET_NAME) df = dataset.get_dataframe() df["D"] = df.A[df.B == df.C][0]
Hope this helps!
Agathe
-
expecting the solution in the prep recipe through formulas but not in the python code recipes. I am good with python but I am looking only through formulas in the prep visual recipe.
-
mluu Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 4 ✭✭✭
Hi - were you able to find an answer to this problem? I am interested in a solution that only uses a visual recipe as well.
-
Mateusz Dataiku DSS Core Designer, Neuron 2020, Registered, Neuron 2021, Neuron 2022 Posts: 91 ✭✭✭✭✭✭
@mluu
I'm not sure if I understand the issue here as well, but why, if you dont want to use python code above can't just use prepare recipe -> formula and use the syntax mentioned in first post so: if ((Col_B==Col_C), Col_A, 0)?*** Ok, nevermind, I understand it now...:)
If this is exactly the same case as in post nr 1, and you have only one row where this condition B=C is met (or multiple rows, but the rule is always the same, so in this case, its red= red - because otherewise I dont know how that would work), my first thought is to try create a new prepare recipe, remove all rows where this condition is not met and remove all duplicates if necessary so you will end up with one row:
A B C D
2. red. red. 2
and then join column D from this output again with orginal dataset with "cross join" ? - I didn't test it, but that's my idea for a work around.
Thanks,
Mateusz