Filter and Replace on condition based in another column

Options
Marcio
Marcio Registered Posts: 3 ✭✭✭✭

Hey everyone.

I've searched around but couldn't find a solution for this problem.

I have the following data:

DescriptionClass
Cable Repair1
Cable Repair20
Cable Repair20
Cable Repair1
Cable Charge10
Cable Adaptor1

Let's say i want to find and replace the values of Class when it is 20 and the Description is Cable Repair, leaving others as they are.

How would one go and create this condition on DSS?

As fair as i know, the Find and Replace Processor has not parameter to use other columns as conditions, does it? Is there any other processor i could use?

Thanks!

Best Answer

  • MickaelH
    MickaelH Dataiker, Registered Posts: 31 Dataiker
    edited July 17 Answer ✓
    Options

    Hi,

    One of the solution could be to use the Formula step in a Prepare recipe. For example to have the behaviour you requested one could use the following formula (I chose 100 as the replacing value):

    if(and(Description=="Cable Repair", Class==20), 100, Class)

    As a side note, on a Formula step you have the possibility to click on Edit to have a dedicated editor with helpers to build your formula and once the formula is ready you have to click Save to apply the formula from the editor.

    More information on the Formula can be found in the documentation: https://doc.dataiku.com/dss/latest/preparation/processors/formula.html

    I hope this helps.

    Regards,

Answers

  • Marcio
    Marcio Registered Posts: 3 ✭✭✭✭
    Options

    Thanks, that surely helped, I had no idea about the Formula processor.

    Now the problem I'm facing is regarding using that same condition but with a regex match, like so:

    if(and(match(Description, '^Cable\s{0,}Rep.{0,}'), Class== 30), 42, Class)

    But nothing is changed after applying the this processor. I'm using this documentation to try and make it work.

    Thanks.

  • MickaelH
    MickaelH Dataiker, Registered Posts: 31 Dataiker
    edited July 17
    Options

    There are two things regarding the formula you proposed:

    • match returns an arrays of matches so you should consider for example that the size of the array is equals to 1
    • you need to double the \ for the space code: \\s

    So a valid formula would be:

    if(and(match(Description, '^Cable\\s{0,}Rep.{0,}').length()==1, Class==30), 42, Class)

    To easily debug your formula, I suggest to test each term to ensure they are correctly written. For example in this case, one can start outputting the result of match(Description, '^Cable\\s{0,}Rep.{0,}') that is an array and from there adjust the formula.

    I hope this will help you write the proper formula for your use case.

    Regards,

  • Marcio
    Marcio Registered Posts: 3 ✭✭✭✭
    Options

    Thanks, that did the trick! I had no idea about needing the double left slash.

    Thanks for your time!

Setup Info
    Tags
      Help me…