Regex expression to extract substring in a large string

vkana
vkana Registered Posts: 4 ✭✭✭

Hi

I wrote the following Regex in Dataiku to extract the 8 digit number starting with 9.
Example: 93809629 or 93650953
Regex: [LCC|FACTURE\ |FACTURE][A-Z]{1,7}\s(\d\d\d\d\d\d\d\d)\
But it doesn't match for all the situations below.


1) LBB2022 FACT 93809929
2) LBBF.90930153 - 06/12/2021
3) LBBING FACTURE NO90857587 DU 02/12/21AFFRANCHISSEMENT
4) LBB90945758 CLT 662063
5) LBBFACT N 90903643 DU 02/12/2021
6) LBB93856595 FACTUREDEPARTEMENT
7) LBBFACTURE 93720887FRAIS AFFRANCHISSEMENTUSSY LBBFACT 93628786 DU 01/12/22
9) LBBFACTURE NO 93741852 DU 05/12/2022AFFRANCHISSEMENT
10) LBBN FACT : 93650972 REEUE LE 04/12/2022AFFRANCHISSEMENTS

Could anyone help me please?
Thank you so much


Operating system used: Windows

Tagged:

Best Answer

  • AdrienL
    AdrienL Dataiker, Alpha Tester Posts: 196 Dataiker
    edited July 17 Answer ✓

    Hi, note that this has little to do with DSS itself

    You can try:

    \D(9\d{7})(?!\d)
    • \D will match anything that is not a digit (to not match within a number)
    • ( starts a capturing group for extraction
    • 9 matches a 9 (duh)
    • \d{7} matches the next 7 digits
    • ) ends the capturing group
    • (?!\d) is a negative lookahead ensuring it is not followed by another digit

    See it with your examples here: https://regex101.com/r/K30Fww/1

Answers

Setup Info
    Tags
      Help me…