Regex function to return string between 2 characters
I'm trying to create a regex function that gives me the string between 2 characters
I have the string below
word1_word2_word3_word4_word5_word6_word7_word8_length_string.txt
and I'm trying to return everything after the 7th instance of "_" and before ".txt"
Desired output: word8_length_string
is there a way to use a regex function / regex tool to accomplish this?
Operating system used: windows
Answers
-
louisbarjon Dataiker, Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 9 Dataiker
Hello,
What is exactly your context ?
If you are using a prepare recipe you can use a formula step and use this regular expression:match(your_column_name, '^(?:[^_]*_){7}(.*)\.txt$')[0]
Note that this regexp explicitly does what you describe, it really counts 7 instances of _ then return everything before .txt
More info about the formula processor here
If you are in a python code recipe, the same regular expression will work as well.
Louis
-
Some alternatives:
- For the regex, it really depends on what the contraints are. For instance,
^.*_(.+)\.\w+$
would match between the last _ and the extension, see (and edit, explain, play with) the test cases and regex here - For the tool, you can use many things depending on the need:
- a python recipe
- a step in a data preparation recipe, with multiple possibilities
- python step
- formula step (as suggested by Louis)
- text extraction step (probably the simplest for a simple need), with its
- more exotic options
- For the regex, it really depends on what the contraints are. For instance,