Join us on July 16th as we explore real-world Reinforcement Learning Learn more

Filter and Replace on condition based in another column

Level 2
Filter and Replace on condition based in another column

Hey everyone.

I've searched around but couldn't find a solution for this problem.

I have the following data:

DescriptionClass
Cable Repair1
Cable Repair20
Cable Repair20
Cable Repair1
Cable Charge10
Cable Adaptor1

 

Let's say i want to find and replace the values of Class when it is 20 and the Description is Cable Repair, leaving others as they are.

How would one go and create this condition on DSS?

As fair as i know, the Find and Replace Processor has not parameter to use other columns as conditions, does it? Is there any other processor i could use?

 

Thanks!

0 Kudos
4 Replies
Dataiker
Dataiker

Hi,

One of the solution could be to use the Formula step in a Prepare recipe. For example to have the behaviour you requested one could use the following formula (I chose 100 as the replacing value):

if(and(Description=="Cable Repair", Class==20), 100, Class)

As a side note, on a Formula step you have the possibility to click on Edit to have a dedicated editor with helpers to build your formula and once the formula is ready you have to click Save to apply the formula from the editor.

More information on the Formula can be found in the documentation: https://doc.dataiku.com/dss/latest/preparation/processors/formula.html

I hope this helps.

Regards,

Mickaël | R&D at Dataiku
Level 2
Author

Thanks, that surely helped, I had no idea about the Formula processor.

Now the problem I'm facing is regarding using that same condition but with a regex match, like so:

 

if(and(match(Description, '^Cable\s{0,}Rep.{0,}'), Class== 30), 42, Class)

 

But nothing is changed after applying the this processor. I'm using this documentation to try and make it work.

 

Thanks.

0 Kudos
Dataiker
Dataiker

There are two things regarding the formula you proposed:

  • match returns an arrays of matches so you should consider for example that the size of the array is equals to 1
  • you need to double the \ for the space code: \\s

So a valid formula would be: 

if(and(match(Description, '^Cable\\s{0,}Rep.{0,}').length()==1, Class==30), 42, Class)

To easily debug your formula, I suggest to test each term to ensure they are correctly written. For example in this case, one can start outputting the result of  match(Description, '^Cable\\s{0,}Rep.{0,}') that is an array and from there adjust the formula.

I hope this will help you write the proper formula for your use case.

Regards,

Mickaël | R&D at Dataiku
Level 2
Author

Thanks, that did the trick! I had no idea about needing the double left slash.

 

Thanks for your time!