Matching value from different row based on condition

SanderVW Registered Posts: 41 ✭✭✭✭

Hi, I am trying to add a new column containing a certain value that is the same for each group. The problem is that this value is originally saved in one row of that group and I do not know how to extract this.

For example, if I have the following dataset:


I would like to calculate a row that has the value of the 'lead' of the group that the row is in, like this:


Does anyone know a good method for this? Preferably one that can be implemented in a prepare recipe. I tried to do this using a new column where only the value from 'lead' rows are saved and then fill that column using the previous value but here I run into issues if there are more than one groups due to sorting. Thanks in advance for any advice!

Operating system used: Windows

Best Answer

  • me2
    me2 Registered Posts: 48 ✭✭✭✭✭
    Answer ✓

    Another way to do it requires a few steps so might not be the most efficient way:

    1) Extract new table that only has the Group and Values only for Role == Lead. You can use Prepare or Split recipes (Role is the key used to split). Unless you have a need for the member values, using Split would create a table that isn't used upstream.

    2) Join the tables using Left join from old table to new table. Bring back the value but rename it "Leader_Value". The join is just using Group so you will get the value regardless of the Role value.

    I hope that help!


  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 1,730 Neuron

    Use a Window Recipe and then on aggregations use either Lead or Lag to get to the previous row. Make sure you sort your dataset in the recipe for consistent results.

  • Ioannis
    Ioannis Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 28 ✭✭✭✭✭

    You can use a prepare recipe:

    Step 1) Create Column with Formula

    Output Column: Leader_value

    Formula: if( where == 'Lead', Value, '')

    Step 2) Fill empty cells with previous/next value

    Column: Leader Value

    Fill with previous value

    Hope it helps

  • SanderVW
    SanderVW Registered Posts: 41 ✭✭✭✭

    Thanks for your reply, unfortunately I've tried this already and it does not work (for my case at least). This is because the dataset might not be sorted in the correct order.

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 1,730 Neuron

    You need to set the order in the Window definitions section of the Window recipe.

  • SanderVW
    SanderVW Registered Posts: 41 ✭✭✭✭

    Thank you, this seems promising. I am running into the issue though that a lag function does not fill the entire group as it only fills up one cell. This also overwrites the 'lead' row with a previous value that it should not get. Is there a way to fill up the entire window with the same value?

    Or do you alternatively know if there is a solution that would work in a prepare recipe? This would be best as I would prefer to keep the flow as concise as possible.

  • SanderVW
    SanderVW Registered Posts: 41 ✭✭✭✭

    Thank you, I can definitely see this working. I am a bit concerned about clogging up the flow though, as this project will be used by quite a few different people and simplicity in the flow would be best. However, if I cannot find a shorter solution I think I will implement this

    Edit: I've ended up implementing this. I used a join recipe on the dataset using the dataset twice as input. In the pre-filter of one of these datasets I filtered for the "lead" rows and on the join I only kept the ''Value'' column from this filtered dataset.

    Thank you for the helpful suggestion!

Setup Info
      Help me…