Standardize the syntax for regular expressions (regex) across all uses in Dataiku DSS

User Story

As a lazy data analyst with limited experience with regular expressions I would like the implementation and syntax of regular expressions to be the same across all uses of regex in DSS.  This would lead to more confidence in my use of Regex.  Ultimately increase the power of DSS.  

Nice to Have:

  • It would be nice if the new v9 regular expressions helper showed up in every place that a regex can be used in DSS.  This should include appropriate in situ examples from column names, cells, or wherever I might be trying to match data.  
  • In fact from an UI perspective I wish that the helper did not show up as a text string under the field that can use the Regex. But that there is a simple icon used across DSS that shows up after a regex enabled field that would get me to the Regex Helper.  Today for most of my regex work, I tend to copy examples to one of the Regex web sites. (Being careful not to release confidential information). Figure out my expression and copy the results back to DSS.😕

Notes:

  • There appear to be a number of different implementations of Regular expressions used in DSS. (Possibly because DSS is written with multiple libraries out of Java, Python and from other places.)  For example:
    • In some cases in shaker formulas it appears that I have to escape single back slash \ as double back slash \\   .  In other places this does not seem to be necessary.
    • in some places it appears that I have to put quotes around regular expressions, for example in visual recipe formulas.  And in some cases I don’t.  For example in regex based column selection.
    • In some cases it appears that I need to create groupings in parentheses to make a match ( ). However, in some other cases it appears that I don’t.
    • In some cases I need to account for all characters in a string to make a match padding my criteria with something like .* or [\s\S]* on both ends, and in other cases I do not.
    • In some cases it appears that I need to include the leading and closing slash for example /.*/ and in some cases it appears that I do not just using something like .*
    • This is some times made more difficult because of cell level “duck-typing” changing something like 08840 to 8840 or when something like 012345678901234567890 gets changed to 1.234567890E19.  My regular expressions that would work on the string versions of these cells fail on the duck typed integer or decimal versions of these cells.  

Regex is powerful and great.  However, making the experience more consistent would be very helpful.  The Regex helper added in DSS v9 is a nice start in this direction.  

 

 

--Tom
4 Comments

Here is another example where the different ways of entering regular expressions in Dataiku DSS can be confusing.  

https://community.dataiku.com/t5/Using-Dataiku/Issue-with-regex-arrayLen-match-min-6-digits-in-strin...

--Tom

Here is another example where the different ways of entering regular expressions in Dataiku DSS can be confusing.  

https://community.dataiku.com/t5/Using-Dataiku/Issue-with-regex-arrayLen-match-min-6-digits-in-strin...

AshleyW
Dataiker

Thanks for providing your suggestion, we're looking into this further and will provide an update if and when available.

Status changed to: Parked

Thanks for providing your suggestion, we're looking into this further and will provide an update if and when available.

MichaelG
Community Manager
Community Manager
 
I hope I helped! Do you Know that if I was Useful to you or Did something Outstanding you can Show your appreciation by giving me a KUDOS?

Looking for more resources to help you use DSS effectively and upskill your knowledge? Check out these great resources: Dataiku Academy | Documentation | Knowledge Base

A reply answered your question? Mark as ‘Accepted Solution’ to help others like you!
Status changed to: In the Backlog