Preparation script: regexp processor creates no column

Highlighted
UserBird Dataiker
Dataiker
Preparation script: regexp processor creates no column
Jump to solution
 
1 Solution

Accepted Solutions
jrouquie Dataiker
Dataiker
Re: Preparation script: regexp processor creates no column
Jump to solution

About the regular expression processor, there is a common misconception: some people expect just one output column, containing everything that has been matched by the regular expression.



But this processor is actually more powerful:




  • First, it allows to create a column with only part of what has been matched. For instance, if you want to extract the link of a simple HTML tag like `<a href="example.com">`, you could write `<a href="([^"]*)">`. The parentheses are a capture, and designate what you want to extract. In this case, the output column will contain `example.com`.

  • Second, it allows to create several columns at once: simply have several captures in the regexp! Which also means that, confusingly, if there are no capture then there are no created columns.



 

View solution in original post

1 Reply
jrouquie Dataiker
Dataiker
Re: Preparation script: regexp processor creates no column
Jump to solution

About the regular expression processor, there is a common misconception: some people expect just one output column, containing everything that has been matched by the regular expression.



But this processor is actually more powerful:




  • First, it allows to create a column with only part of what has been matched. For instance, if you want to extract the link of a simple HTML tag like `<a href="example.com">`, you could write `<a href="([^"]*)">`. The parentheses are a capture, and designate what you want to extract. In this case, the output column will contain `example.com`.

  • Second, it allows to create several columns at once: simply have several captures in the regexp! Which also means that, confusingly, if there are no capture then there are no created columns.



 

View solution in original post

Labels (2)