Is it possible to use variables in regex in a prepare recipe ?

Charly
Charly Partner, Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Dataiku DSS Adv Designer, Registered Posts: 13 Partner

Greetings !

I would like to use a variable in a regex in a prepare recipe. It's easy in Python but I would like to work with it without code.

Example : My variable is "date" and the value "20230105", I have columns named after dates and I only want to keep those finishing by this date. I use the processor "Delete/keep columns by name", chose the regex option... then what ? $ is a key word in regex, I can't use it to call my variable ?

Sorry for my poor english, have a nice day and a happy new year !

Best Answer

  • Sarina
    Sarina Dataiker, Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 317 Dataiker
    Answer ✓

    Hi @Charly
    ,

    Indeed variables don't work in the regex of the Drop/Keep rows prepare recipe processor. You can instead use a formula step and reference your variable in the formula step. You can then drop any rows that don't meet your formula step condition with a subsequent drop/keep rows step. For example:

    Screen Shot 2023-01-06 at 4.36.28 PM.png

    You can click on "Open editor panel" to test out your variable syntax and condition:

    Screen Shot 2023-01-06 at 4.36.41 PM.png

    You can then use the filter processer/keep rows processor to keep all rows with the value of "true". Let me know if this approach makes sense to you.

    Thanks,
    Sarina 

Answers

  • Charly
    Charly Partner, Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Dataiku DSS Adv Designer, Registered Posts: 13 Partner

    Hi @SarinaS
    , thanks for the reply !

    It was indeed a solution I thought about (even if it's the column names I was talking about, not the rows). I was rather curious about the question of the use of variables in regex and you did answer my question when you said we can't

    Do you know if there is an exhaustive list of places where variables can be called ? I often wonder : to create partitions, in a dataset name, in a check... I'm not always sure about them I have to admit.

    Best regards,

    Charly

  • Sarina
    Sarina Dataiker, Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 317 Dataiker

    Hi @Charly
    ,

    That makes sense, it's not always clear where you are able to use variables especially in the UI. The best document that describes where you can use variables is this one. For the most part, variables can always be used in code and formula steps across DSS. In the UI itself, there are some places they can be used (specifically in dataset settings for example) along with some other locations that specify you can use them in the UI. There are some places where variable expansion hasn't been implemented though, so if using a variable doesn't work in a specific location, then finding a formula/code solution is usually the simplest approach.

    Thanks,
    Sarina

Setup Info
    Tags
      Help me…