Submit your inspiring success story or innovative use case to the 2022 Dataiku Frontrunner Awards! ENTER YOUR SUBMISSION

Add S3 path name parts as columns in dataset?

Solved!
MarkPundurs
Level 3
Add S3 path name parts as columns in dataset?

I have source S3 files whose paths are of the form <engine>_<yyyymmdd>/<tablename>.csv. I want to take all files named mytable.csv and create a dataset whose fields are those in the files - PLUS the fields "engine" and "date", with values for each record derived from that record's source file path. How can I accomplish this in DSS, with visual and/or code elements as needed? (Partioning seems to do a lot of what I want, but I can't find how to turn partitions into dataset fields.)


Operating system used: Linux

0 Kudos
1 Solution
AlexT
Dataiker
Dataiker

Hi @MarkPundurs ,

You can turn partition in a field in the dataset using the processor below see (2). 

 

1) You can also use "Files from folder" dataset and filter the files you want to include. 

Screenshot 2022-02-25 at 08.43.43.png

Screenshot 2022-02-25 at 08.44.21.png

2) You can use "Enrich records with files info"  in prepare recipe to file path of the output the prepare recipe will create. Screenshot 2022-02-25 at 08.40.56.png

Let me know if this would work for you. 

View solution in original post

1 Reply
AlexT
Dataiker
Dataiker

Hi @MarkPundurs ,

You can turn partition in a field in the dataset using the processor below see (2). 

 

1) You can also use "Files from folder" dataset and filter the files you want to include. 

Screenshot 2022-02-25 at 08.43.43.png

Screenshot 2022-02-25 at 08.44.21.png

2) You can use "Enrich records with files info"  in prepare recipe to file path of the output the prepare recipe will create. Screenshot 2022-02-25 at 08.40.56.png

Let me know if this would work for you.