Matching value from different row based on condition
Hi, I am trying to add a new column containing a certain value that is the same for each group. The problem is that this value is originally saved in one row of that group and I do not know how to extract this.
For example, if I have the following dataset:
Group | Role | Value |
A | Lead | 1 |
A | Member | 2 |
B | Lead | 3 |
B | Member | 4 |
I would like to calculate a row that has the value of the 'lead' of the group that the row is in, like this:
Group | Role | Value | Leader_value |
A | Lead | 1 | 1 |
A | Member | 2 | 1 |
B | Lead | 3 | 3 |
B | Member | 4 | 3 |
Does anyone know a good method for this? Preferably one that can be implemented in a prepare recipe. I tried to do this using a new column where only the value from 'lead' rows are saved and then fill that column using the previous value but here I run into issues if there are more than one groups due to sorting. Thanks in advance for any advice!
Operating system used: Windows
Best Answer
-
Another way to do it requires a few steps so might not be the most efficient way:
1) Extract new table that only has the Group and Values only for Role == Lead. You can use Prepare or Split recipes (Role is the key used to split). Unless you have a need for the member values, using Split would create a table that isn't used upstream.
2) Join the tables using Left join from old table to new table. Bring back the value but rename it "Leader_Value". The join is just using Group so you will get the value regardless of the Role value.
I hope that help!
Answers
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 1,926 Neuron
Use a Window Recipe and then on aggregations use either Lead or Lag to get to the previous row. Make sure you sort your dataset in the recipe for consistent results.
-
Ioannis Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 28 ✭✭✭✭✭
You can use a prepare recipe:
Step 1) Create Column with Formula
Output Column: Leader_value
Formula: if( where == 'Lead', Value, '')
Step 2) Fill empty cells with previous/next value
Column: Leader Value
Fill with previous value
Hope it helps
-
Thanks for your reply, unfortunately I've tried this already and it does not work (for my case at least). This is because the dataset might not be sorted in the correct order.
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 1,926 Neuron
You need to set the order in the Window definitions section of the Window recipe.
-
Thank you, this seems promising. I am running into the issue though that a lag function does not fill the entire group as it only fills up one cell. This also overwrites the 'lead' row with a previous value that it should not get. Is there a way to fill up the entire window with the same value?
Or do you alternatively know if there is a solution that would work in a prepare recipe? This would be best as I would prefer to keep the flow as concise as possible. -
Thank you, I can definitely see this working. I am a bit concerned about clogging up the flow though, as this project will be used by quite a few different people and simplicity in the flow would be best. However, if I cannot find a shorter solution I think I will implement this
Edit: I've ended up implementing this. I used a join recipe on the dataset using the dataset twice as input. In the pre-filter of one of these datasets I filtered for the "lead" rows and on the join I only kept the ''Value'' column from this filtered dataset.
Thank you for the helpful suggestion!