select files with regex

pi_485
pi_485 Partner, L2 Admin, Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS Core Concepts, Registered Posts: 6 Partner

We want to select files from a cloud based folder based on regex. We are providing inclusion rule as CN_CRPROD to select only CN_CRPROD.csv, but it doesn't seems to be working.

/2019/06/04/20190604_CN_CRJRNL.csv 210.91 KB 2019/12/11-18:12:20
/2019/06/04/20190604_CN_CRPROD.csv 4.61 KB 2019/12/11-18:12:19
We tested regex on https://regex101.com/ and there we can see, it selects only CN_CRPROD.
How can we get it working on DSS?

Best Answer

  • jfyuen
    jfyuen Dataiker, Registered Posts: 12 Dataiker
    Answer ✓

    Can you replace your regex "CN_CRPROD" by ".*CNCRPOD.*" (or ".*CNCRPOD\.csv" if you want to be more specific)? Otherwise it won't match other parts of the filename.

Answers

  • jfyuen
    jfyuen Dataiker, Registered Posts: 12 Dataiker

    Hi,

    Could you please give a bit more context ? Are you trying to get files in a Python recipe? Or in a "Files in folder" dataset?

    What about the regex you used?

  • pi_485
    pi_485 Partner, L2 Admin, Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS Core Concepts, Registered Posts: 6 Partner

    For context,
    we are reading files from Azure Blob Storage where they are stored in directory structure based on date.

    Screenshot 2020-11-17 210400.jpg

    Please find the regex setting available in screenshot.

  • pi_485
    pi_485 Partner, L2 Admin, Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS Core Concepts, Registered Posts: 6 Partner

    Thank you, it worked.

    Can you help with a guide with respect to Regex expression that we need to use in Dataiku DSS?

  • jfyuen
    jfyuen Dataiker, Registered Posts: 12 Dataiker

    For file listing (and most of the time), DSS uses the Java regex / Pattern. You can for example find a reference here.

Setup Info
    Tags
      Help me…