How to fill the column with the value in a row based on a condition

Options
Tsurapaneni
Tsurapaneni Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Registered Posts: 41 ✭✭✭✭

Hi Team,

I have a use case where I have to replace a value based on the row value on another condition. for an instance I have 3 columns (ABC) and have to generate a new column (D)

A. B. C. Output ->. A. B. C. D

2 red. red 2. red. red. 2

3. red. blue 3 red blue 2

4 green. pink 4. green pink. 2

In the above example I have created a new column which is D based on the condition that if ((B==C), A, 000) which gives the value 0 for the rest of the rows which aren't a match but I want the same value throughout the column without explicitly defining the value (eg: using the fill column processor and using the 2 to the value). this might serve now but not in future.

Thanks in advance !

Answers

  • AgatheG
    AgatheG Dataiker Posts: 37 Dataiker
    edited July 17
    Options

    Hi Tsurapaneni,

    I am not sure to understand your use case. From what I gathered in your post, you mean to:

    1. Find the value in column A such that B == C (assuming either one and only one row matches this condition, or that A always takes the same value for rows matching the condition)

    2. Create a new column D filled with the value computed in step 1

    If this is indeed your intent to do, you can do so with a short Python recipe, using these lines:

    dataset = dataiku.Dataset(YOUR_DATASET_NAME)
    df = dataset.get_dataframe()
    
    df["D"] = df.A[df.B == df.C][0]

    Hope this helps!

    Agathe

  • Tsurapaneni
    Tsurapaneni Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Registered Posts: 41 ✭✭✭✭
    Options

    expecting the solution in the prep recipe through formulas but not in the python code recipes. I am good with python but I am looking only through formulas in the prep visual recipe.

  • mluu
    mluu Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 4 ✭✭✭
    Options

    @Tsurapaneni
    @AgatheG

    Hi - were you able to find an answer to this problem? I am interested in a solution that only uses a visual recipe as well.

  • emate
    emate Dataiku DSS Core Designer, Neuron 2020, Registered, Neuron 2021, Neuron 2022 Posts: 91 ✭✭✭✭✭✭
    Options

    @mluu
    I'm not sure if I understand the issue here as well, but why, if you dont want to use python code above can't just use prepare recipe -> formula and use the syntax mentioned in first post so: if ((Col_B==Col_C), Col_A, 0)?

    *** Ok, nevermind, I understand it now...:)

    If this is exactly the same case as in post nr 1, and you have only one row where this condition B=C is met (or multiple rows, but the rule is always the same, so in this case, its red= red - because otherewise I dont know how that would work), my first thought is to try create a new prepare recipe, remove all rows where this condition is not met and remove all duplicates if necessary so you will end up with one row:

    A B C D

    2. red. red. 2

    and then join column D from this output again with orginal dataset with "cross join" ? - I didn't test it, but that's my idea for a work around.

    Thanks,

    Mateusz

Setup Info
    Tags
      Help me…