Discover all of the brand-new features and improvements to existing capabilities in the Dataiku 11.3 updateLET'S GO

Is it possible to use variables in regex in a prepare recipe ?

Solved!
Charly
Level 1
Is it possible to use variables in regex in a prepare recipe ?

Greetings !

I would like to use a variable in a regex in a prepare recipe. It's easy in Python but I would like to work with it without code.

Example : My variable is "date" and the value "20230105", I have columns named after dates and I only want to keep those finishing by this date. I use the processor "Delete/keep columns by name", chose the regex option... then what ? $ is a key word in regex, I can't use it to call my variable ?

Sorry for my poor english, have a nice day and a happy new year ! 😃

0 Kudos
1 Solution
SarinaS
Dataiker

Hi @Charly,

Indeed variables don't work in the regex of the Drop/Keep rows prepare recipe processor. You can instead use a formula step and reference your variable in the formula step. You can then drop any rows that don't meet your formula step condition with a subsequent drop/keep rows step. For example:

Screen Shot 2023-01-06 at 4.36.28 PM.png

You can click on "Open editor panel" to test out your variable syntax and condition:

Screen Shot 2023-01-06 at 4.36.41 PM.png

You can then use the filter processer/keep rows processor to keep all rows with the value of "true". Let me know if this approach makes sense to you.

Thanks,
Sarina 

 

View solution in original post

0 Kudos
3 Replies
SarinaS
Dataiker

Hi @Charly,

Indeed variables don't work in the regex of the Drop/Keep rows prepare recipe processor. You can instead use a formula step and reference your variable in the formula step. You can then drop any rows that don't meet your formula step condition with a subsequent drop/keep rows step. For example:

Screen Shot 2023-01-06 at 4.36.28 PM.png

You can click on "Open editor panel" to test out your variable syntax and condition:

Screen Shot 2023-01-06 at 4.36.41 PM.png

You can then use the filter processer/keep rows processor to keep all rows with the value of "true". Let me know if this approach makes sense to you.

Thanks,
Sarina 

 

0 Kudos
Charly
Level 1
Author

Hi @SarinaS , thanks for the reply !

It was indeed a solution I thought about (even if it's the column names I was talking about, not the rows). I was rather curious about the question of the use of variables in regex and you did answer my question when you said we can't 😃

Do you know if there is an exhaustive list of places where variables can be called ? I often wonder : to create partitions, in a dataset name, in a check... I'm not always sure about them I have to admit.

Best regards,

Charly

0 Kudos
SarinaS
Dataiker

Hi @Charly,

That makes sense, it's not always clear where you are able to use variables especially in the UI. The best document that describes where you can use variables is this one.  For the most part, variables can always be used in code and formula steps across DSS. In the UI itself, there are some places they can be used (specifically in dataset settings for example) along with some other locations that specify you can use them in the UI. There are some places where variable expansion hasn't been implemented though, so if using a variable doesn't work in a specific location, then finding a formula/code solution is usually the simplest approach. 

Thanks,
Sarina

0 Kudos