Filter and Replace on condition based in another column
Hey everyone.
I've searched around but couldn't find a solution for this problem.
I have the following data:
Description | Class |
Cable Repair | 1 |
Cable Repair | 20 |
Cable Repair | 20 |
Cable Repair | 1 |
Cable Charge | 10 |
Cable Adaptor | 1 |
Let's say i want to find and replace the values of Class when it is 20 and the Description is Cable Repair, leaving others as they are.
How would one go and create this condition on DSS?
As fair as i know, the Find and Replace Processor has not parameter to use other columns as conditions, does it? Is there any other processor i could use?
Thanks!
Best Answer
-
Hi,
One of the solution could be to use the Formula step in a Prepare recipe. For example to have the behaviour you requested one could use the following formula (I chose 100 as the replacing value):
if(and(Description=="Cable Repair", Class==20), 100, Class)
As a side note, on a Formula step you have the possibility to click on Edit to have a dedicated editor with helpers to build your formula and once the formula is ready you have to click Save to apply the formula from the editor.
More information on the Formula can be found in the documentation: https://doc.dataiku.com/dss/latest/preparation/processors/formula.html
I hope this helps.
Regards,
Answers
-
Thanks, that surely helped, I had no idea about the Formula processor.
Now the problem I'm facing is regarding using that same condition but with a regex match, like so:
if(and(match(Description, '^Cable\s{0,}Rep.{0,}'), Class== 30), 42, Class)
But nothing is changed after applying the this processor. I'm using this documentation to try and make it work.
Thanks.
-
There are two things regarding the formula you proposed:
- match returns an arrays of matches so you should consider for example that the size of the array is equals to 1
- you need to double the \ for the space code: \\s
So a valid formula would be:
if(and(match(Description, '^Cable\\s{0,}Rep.{0,}').length()==1, Class==30), 42, Class)
To easily debug your formula, I suggest to test each term to ensure they are correctly written. For example in this case, one can start outputting the result of match(Description, '^Cable\\s{0,}Rep.{0,}') that is an array and from there adjust the formula.
I hope this will help you write the proper formula for your use case.
Regards,
-
Thanks, that did the trick! I had no idea about needing the double left slash.
Thanks for your time!