Community Conundrums are live! Learn more

Preparation script: regexp processor creates no column

Dataiker
Dataiker
Preparation script: regexp processor creates no column
 
1 Reply
Dataiker
Dataiker

About the regular expression processor, there is a common misconception: some people expect just one output column, containing everything that has been matched by the regular expression.



But this processor is actually more powerful:




  • First, it allows to create a column with only part of what has been matched. For instance, if you want to extract the link of a simple HTML tag like `<a href="example.com">`, you could write `<a href="([^"]*)">`. The parentheses are a capture, and designate what you want to extract. In this case, the output column will contain `example.com`.

  • Second, it allows to create several columns at once: simply have several captures in the regexp! Which also means that, confusingly, if there are no capture then there are no created columns.