Submit your use case or success story to the 2023 edition of the Dataiku Frontrunner Awards ENTER YOUR SUBMISSION

Add S3 path name parts as columns in dataset?

Solved!
MarkPundurs
Level 3
Add S3 path name parts as columns in dataset?

I have source S3 files whose paths are of the form <engine>_<yyyymmdd>/<tablename>.csv. I want to take all files named mytable.csv and create a dataset whose fields are those in the files - PLUS the fields "engine" and "date", with values for each record derived from that record's source file path. How can I accomplish this in DSS, with visual and/or code elements as needed? (Partioning seems to do a lot of what I want, but I can't find how to turn partitions into dataset fields.)


Operating system used: Linux

0 Kudos
1 Solution
AlexT
Dataiker

Hi @MarkPundurs ,

You can turn partition in a field in the dataset using the processor below see (2). 

 

1) You can also use "Files from folder" dataset and filter the files you want to include. 

Screenshot 2022-02-25 at 08.43.43.png

Screenshot 2022-02-25 at 08.44.21.png

2) You can use "Enrich records with files info"  in prepare recipe to file path of the output the prepare recipe will create. Screenshot 2022-02-25 at 08.40.56.png

Let me know if this would work for you. 

View solution in original post

1 Reply
AlexT
Dataiker

Hi @MarkPundurs ,

You can turn partition in a field in the dataset using the processor below see (2). 

 

1) You can also use "Files from folder" dataset and filter the files you want to include. 

Screenshot 2022-02-25 at 08.43.43.png

Screenshot 2022-02-25 at 08.44.21.png

2) You can use "Enrich records with files info"  in prepare recipe to file path of the output the prepare recipe will create. Screenshot 2022-02-25 at 08.40.56.png

Let me know if this would work for you.