Extract date in filename

Options
J2U_45000
J2U_45000 Partner, Registered Posts: 3 Partner

Hi,

I created a column in dataiku which retrieves the filename :

get(SOURCEFILENAME, lastIndexOf(SOURCEFILENAME, "/")+1, length(SOURCEFILENAME))

I have differents files, which are named this way:

ZMMFI 2021.12.P1.xlsx
ZMMFI 2021.12.P2.xlsx
ZMMFI 2021.12.P3.xlsx
ZMMFI 2021.12.P4.xlsx
ZMMFI_OESV 2018.12 [08.01.2019].xlsx
ZMMFI_OESV 2019.12 [13.01.2020].xlsx
ZMMFI_OESV 2020.12 [27.01.2021].xlsx
ZMMFI_OESV 2022.02.xlsx
ZMMFI_OESX 2018.12 [08.01.2019].xlsx
ZMMFI_OESX 2019.12 [13.01.2020].xlsx
ZMMFI_OESX 2020.12 [27.01.2021].xlsx
ZMMFI_OSFI 2018.03 [08.01.2019].xlsx
ZMMFI_OSFI 2018.06 [08.01.2019].xlsx
ZMMFI_OSFI 2018.10 [08.01.2019].xlsx
ZMMFI_OSFI 2018.12 [08.01.2019].xlsx
ZMMFI_OSFI 2019.12 [13.01.2020].XLSX
ZMMFI_OSFI 2020.12 [27.01.2021].xlsx
ZMMFI_OSFI 2022.02.xlsx
ZMMFI_RTMV 2018.02 [08.01.2019].xlsx
ZMMFI_RTMV 2018.04 [08.01.2019].XLSX
ZMMFI_RTMV 2018.06 [08.01.2019].XLSX
ZMMFI_RTMV 2018.08 [08.01.2019].XLSX
ZMMFI_RTMV 2018.10 [08.01.2019].XLSX
ZMMFI_RTMV 2018.12 [08.01.2019].xlsx
ZMMFI_RTMV 2020.12.1 [27.01.2021].xlsx
ZMMFI_RTMV 2020.12.2 [27.01.2021].xlsx
ZMMFI_RTMV 2022.02.xlsx

How can I retrieve only the date of each file?

thank you in advance for your help

Answers

  • Alexandru
    Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,209 Dataiker
    Options

    Hi,

    You could try using extract with regex with the prepare processor/s, Extract with regular expression

    To extract The first date

    (\s\d\d\d\d\.\d\d).*

    Extract date from within []

    \s\[(.*)\]\.

Setup Info
    Tags
      Help me…