Regex expression to extract substring in a large string

Solved!
vkana
Level 2
Regex expression to extract substring in a large string

Hi

I wrote the following Regex in Dataiku to extract the 8 digit number starting with 9.
Example: 93809629 or 93650953
Regex: [LCC|FACTURE\ |FACTURE][A-Z]{1,7}\s(\d\d\d\d\d\d\d\d)\
But it doesn't match for all the situations below.


1) LBB2022 FACT 93809929
2) LBBF.90930153 - 06/12/2021
3) LBBING FACTURE NO90857587 DU 02/12/21AFFRANCHISSEMENT
4) LBB90945758 CLT 662063
5) LBBFACT N 90903643 DU 02/12/2021
6) LBB93856595 FACTUREDEPARTEMENT
7) LBBFACTURE 93720887FRAIS AFFRANCHISSEMENTUSSY
๐Ÿ˜Ž LBBFACT 93628786 DU 01/12/22
9) LBBFACTURE NO 93741852 DU 05/12/2022AFFRANCHISSEMENT
10) LBBN FACT : 93650972 REEUE LE 04/12/2022AFFRANCHISSEMENTS

Could anyone help me please?
Thank you so much


Operating system used: Windows

0 Kudos
1 Solution
AdrienL
Dataiker

Hi, note that this has little to do with DSS itself ๐Ÿ™‚

You can try: 

\D(9\d{7})(?!\d)
  • \D will match anything that is not a digit (to not match within a number)
  • ( starts a capturing group for extraction
  • 9 matches a 9 (duh)
  • \d{7} matches the next 7 digits
  • ) ends the capturing group
  • (?!\d) is a negative lookahead ensuring it is not followed by another digit

See it with your examples here: https://regex101.com/r/K30Fww/1

View solution in original post

0 Kudos
2 Replies
AdrienL
Dataiker

Hi, note that this has little to do with DSS itself ๐Ÿ™‚

You can try: 

\D(9\d{7})(?!\d)
  • \D will match anything that is not a digit (to not match within a number)
  • ( starts a capturing group for extraction
  • 9 matches a 9 (duh)
  • \d{7} matches the next 7 digits
  • ) ends the capturing group
  • (?!\d) is a negative lookahead ensuring it is not followed by another digit

See it with your examples here: https://regex101.com/r/K30Fww/1

0 Kudos
vkana
Level 2
Author

It works correctly!!! Thank you so much.

0 Kudos