Community Conundrum 27: Stacks of Questions is live! Read More

select files with regex

Level 2
Level 2
select files with regex

We want to select files from a cloud based folder based on regex. We are providing inclusion rule as CN_CRPROD to select only CN_CRPROD.csv, but it doesn't seems to be working.

/2019/06/04/20190604_CN_CRJRNL.csv    210.91 KB    2019/12/11-18:12:20
/2019/06/04/20190604_CN_CRPROD.csv    4.61 KB    2019/12/11-18:12:19
 
We tested regex on https://regex101.com/ and there we can see, it selects only CN_CRPROD.
 
How can we get it working on DSS?
0 Kudos
5 Replies
Dataiker
Dataiker

Hi,

Could you please give a bit more context ? Are you trying to get files in a Python recipe? Or in a "Files in folder" dataset?

What about the regex you used?

0 Kudos
Level 2
Level 2
Author

For context,
we are reading files from Azure Blob Storage where they are stored in directory structure based on date.

Screenshot 2020-11-17 210400.jpg

Please find the regex setting available in screenshot.

0 Kudos
Dataiker
Dataiker

Can you replace your regex "CN_CRPROD" by ".*CNCRPOD.*" (or ".*CNCRPOD\.csv" if you want to be more specific)? Otherwise it won't match other parts of the filename.

Level 2
Level 2
Author

Thank you, it worked.

Can you help with a guide with respect to Regex expression that we need to use in Dataiku DSS?

0 Kudos
Dataiker
Dataiker

For file listing (and most of the time), DSS uses the Java regex / Pattern. You can for example find a reference here.