Survey banner
The Dataiku Community is moving to a new home! New posts are now disabled and the community will shortly be in temporary read only mode: LEARN MORE

Matching value from different row based on condition

Solved!
SanderVW
Level 3
Matching value from different row based on condition

Hi, I am trying to add a new column containing a certain value that is the same for each group. The problem is that this value is originally saved in one row of that group and I do not know how to extract this.

For example, if I have the following dataset:

GroupRoleValue
ALead1
AMember2
BLead3
BMember4


I would like to calculate a row that has the value of the 'lead' of the group that the row is in, like this:

GroupRoleValueLeader_value
ALead11
AMember21
BLead33
BMember43


Does anyone know a good method for this? Preferably one that can be implemented in a prepare recipe. I tried to do this using a new column where only the value from 'lead' rows are saved and then fill that column using the previous value but here I run into issues if there are more than one groups due to sorting. Thanks in advance for any advice!


Operating system used: Windows

0 Kudos
1 Solution
me2
Level 3

Another way to do it requires a few steps so might not be the most efficient way:

1) Extract new table that only has the Group and Values only for Role == Lead.  You can use Prepare or Split recipes (Role is the key used to split).  Unless you have a need for the member values, using Split would create a table that isn't used upstream.

2) Join the tables using Left join from old table to new table.  Bring back the value but rename it "Leader_Value".  The join is just using Group so you will get the value regardless of the Role value.

I hope that help!

View solution in original post

7 Replies
Turribeach

Use a Window Recipe and then on aggregations use either Lead or Lag to get to the previous row. Make sure you sort your dataset in the recipe for consistent results. 

SanderVW
Level 3
Author

Thank you, this seems promising. I am running into the issue though that a lag function does not fill the entire group as it only fills up one cell. This also overwrites the 'lead' row with a previous value that it should not get. Is there a way to fill up the entire window with the same value?

Or do you alternatively know if there is a solution that would work in a prepare recipe? This would be best as I would prefer to keep the flow as concise as possible. 

0 Kudos
imanousar
Level 3

You can use a prepare recipe:

Step 1) Create Column with Formula 

 Output Column: Leader_value

 Formula: if( where == 'Lead', Value, '') 

Step 2) Fill empty cells with previous/next value

Column: Leader Value

Fill with previous value

 

Hope it helps

0 Kudos
SanderVW
Level 3
Author

Thanks for your reply, unfortunately I've tried this already and it does not work (for my case at least). This is because the dataset might not be sorted in the correct order.

0 Kudos

You need to set the order in the Window definitions section of the Window recipe.

0 Kudos
me2
Level 3

Another way to do it requires a few steps so might not be the most efficient way:

1) Extract new table that only has the Group and Values only for Role == Lead.  You can use Prepare or Split recipes (Role is the key used to split).  Unless you have a need for the member values, using Split would create a table that isn't used upstream.

2) Join the tables using Left join from old table to new table.  Bring back the value but rename it "Leader_Value".  The join is just using Group so you will get the value regardless of the Role value.

I hope that help!

SanderVW
Level 3
Author

Thank you, I can definitely see this working. I am a bit concerned about clogging up the flow though, as this project will be used by quite a few different people and simplicity in the flow would be best. However, if I cannot find a shorter solution I think I will implement this 🙂

 

Edit: I've ended up implementing this. I used a join recipe on the dataset using the dataset twice as input. In the pre-filter of one of these datasets I filtered for the "lead" rows and on the join I only kept the ''Value'' column from this filtered dataset.

 

Thank you for the helpful suggestion!