Join us at the Everyday AI Conference in London, New York & Bengaluru! REGISTER NOW

Extract date in filename

J2U_45000
Level 1
Level 1
Extract date in filename

Hi,

I created a column in dataiku which retrieves the filename :

get(SOURCEFILENAME, lastIndexOf(SOURCEFILENAME, "/")+1, length(SOURCEFILENAME))

I have differents files, which are named this way:

ZMMFI 2021.12.P1.xlsx
ZMMFI 2021.12.P2.xlsx
ZMMFI 2021.12.P3.xlsx
ZMMFI 2021.12.P4.xlsx
ZMMFI_OESV 2018.12 [08.01.2019].xlsx
ZMMFI_OESV 2019.12 [13.01.2020].xlsx
ZMMFI_OESV 2020.12 [27.01.2021].xlsx
ZMMFI_OESV 2022.02.xlsx
ZMMFI_OESX 2018.12 [08.01.2019].xlsx
ZMMFI_OESX 2019.12 [13.01.2020].xlsx
ZMMFI_OESX 2020.12 [27.01.2021].xlsx
ZMMFI_OSFI 2018.03 [08.01.2019].xlsx
ZMMFI_OSFI 2018.06 [08.01.2019].xlsx
ZMMFI_OSFI 2018.10 [08.01.2019].xlsx
ZMMFI_OSFI 2018.12 [08.01.2019].xlsx
ZMMFI_OSFI 2019.12 [13.01.2020].XLSX
ZMMFI_OSFI 2020.12 [27.01.2021].xlsx
ZMMFI_OSFI 2022.02.xlsx
ZMMFI_RTMV 2018.02 [08.01.2019].xlsx
ZMMFI_RTMV 2018.04 [08.01.2019].XLSX
ZMMFI_RTMV 2018.06 [08.01.2019].XLSX
ZMMFI_RTMV 2018.08 [08.01.2019].XLSX
ZMMFI_RTMV 2018.10 [08.01.2019].XLSX
ZMMFI_RTMV 2018.12 [08.01.2019].xlsx
ZMMFI_RTMV 2020.12.1 [27.01.2021].xlsx
ZMMFI_RTMV 2020.12.2 [27.01.2021].xlsx
ZMMFI_RTMV 2022.02.xlsx

How can I retrieve only the date of each file?

thank you in advance for your help

0 Kudos
1 Reply
AlexT
Dataiker
Dataiker

Hi,

You could try using extract with regex with the prepare processor/s, Extract with regular expression

To extract The first date 

(\s\d\d\d\d\.\d\d).*

Extract date from within [] 

\s\[(.*)\]\.