Extracting filename from imported data files

Options
Nikiko
Nikiko Registered Posts: 5 ✭✭✭

I am trying to extract the file name of multiple csv files from:

1. Folder (with uploaded csv data files)

--> 2. Created dataset from these files

--> 3. Prepare recipe, using [Misc]

--> Enrich records with context information

--> 4. Key in a field header name under "output filename column"

However, all the records are empty and the filename is not showing up.

How can we extract the filename for each row of data?

Answers

  • AlexGo
    AlexGo Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 18 Dataiker
    Options

    Hi Nikiko,

    The steps you outlined should work - try moving it so that it's the first step in the prepare recipe?

  • Nikiko
    Nikiko Registered Posts: 5 ✭✭✭
    Options

    I tried 2 routes:

    1. Leads_All: Created dataset from multiple data files

    2. test: Just grabbing one data file as the dataset

    Both to add on the prepare recipe, both returned empty rows for the context information.

    Is it that I missed out on any settings somewhere when creating the dataset?

    image001.png

  • AlexGo
    AlexGo Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 18 Dataiker
    Options

    Hi Nikiko,

    There isn't any setting in the dataset that I'm aware of. Can you attach a screenshot of your Prepare recipe? I've attached mine with dummy data here for reference but couldn't replicate your issue.

    Screen Shot 2022-04-12 at 8.59.11 AM.pngScreen Shot 2022-04-12 at 8.59.56 AM.pngScreen Shot 2022-04-12 at 9.00.24 AM.png

  • Nikiko
    Nikiko Registered Posts: 5 ✭✭✭
    Options

    I did the same steps. To try out, I put a header name to all the fields and also try out just the last three output columns, still it doesn't work.

    image001 (1).png

    The columns for test3, test4, test5 and test 1 are all empty.image002 (1).png

  • AlexGo
    AlexGo Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 18 Dataiker
    Options

    That's odd. I can't replicate the issue on my end. Just to confirm, this is with CSVs and using the DSS engine?

    It might be worthwhile creating a support ticket and attaching the job log to it using the following instructions:

    https://doc.dataiku.com/dss/latest/troubleshooting/obtaining-support.html#editor-support-for-dataiku-customers

  • Nikiko
    Nikiko Registered Posts: 5 ✭✭✭
    Options

    Yes, the files are all csv files and I am using the DSS platform for the upload the folder and using "Create dataset" icon to create the dataset before the "prepare" recipe.

    Okie, I shall log the case with Dataiku.

    Great Thanks for your reply!

  • Nikiko
    Nikiko Registered Posts: 5 ✭✭✭
    Options

    Managed to find a solution to this, from one of the Dataiku trainer during my training session.

    It's due to the setting boxed up in the image below.

    To select the type as csv format.

    For the skip next time, previously I had skipped a couple of lines, later to set them to zero. *not sure if this is related also?)

    Thank You.

  • CoreyS
    CoreyS Dataiker Alumni, Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Registered Posts: 1,150 ✭✭✭✭✭✭✭✭✭
    Options

    Thank you for sharing your solution with this @Nikiko
    !

  • Georghios
    Georghios Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 15 ✭✭✭
    Options

    I have the same issue using json files and storing as SQL table. It works when I save it as a flat file on blob storage.

Setup Info
    Tags
      Help me…