Extracting filename from imported data files
I am trying to extract the file name of multiple csv files from:
1. Folder (with uploaded csv data files)
--> 2. Created dataset from these files
--> 3. Prepare recipe, using [Misc]
--> Enrich records with context information
--> 4. Key in a field header name under "output filename column"
However, all the records are empty and the filename is not showing up.
How can we extract the filename for each row of data?
Answers
-
AlexGo Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 18 Dataiker
Hi Nikiko,
The steps you outlined should work - try moving it so that it's the first step in the prepare recipe?
-
I tried 2 routes:
1. Leads_All: Created dataset from multiple data files
2. test: Just grabbing one data file as the dataset
Both to add on the prepare recipe, both returned empty rows for the context information.
Is it that I missed out on any settings somewhere when creating the dataset?
-
AlexGo Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 18 Dataiker
Hi Nikiko,
There isn't any setting in the dataset that I'm aware of. Can you attach a screenshot of your Prepare recipe? I've attached mine with dummy data here for reference but couldn't replicate your issue.
-
I did the same steps. To try out, I put a header name to all the fields and also try out just the last three output columns, still it doesn't work.
The columns for test3, test4, test5 and test 1 are all empty.
-
AlexGo Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 18 Dataiker
That's odd. I can't replicate the issue on my end. Just to confirm, this is with CSVs and using the DSS engine?
It might be worthwhile creating a support ticket and attaching the job log to it using the following instructions:
-
Yes, the files are all csv files and I am using the DSS platform for the upload the folder and using "Create dataset" icon to create the dataset before the "prepare" recipe.
Okie, I shall log the case with Dataiku.
Great Thanks for your reply!
-
Managed to find a solution to this, from one of the Dataiku trainer during my training session.
It's due to the setting boxed up in the image below.
To select the type as csv format.
For the skip next time, previously I had skipped a couple of lines, later to set them to zero. *not sure if this is related also?)
Thank You.
-
Georghios Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 15 ✭✭✭
I have the same issue using json files and storing as SQL table. It works when I save it as a flat file on blob storage.