Add S3 path name parts as columns in dataset?

MarkPundurs · February 2022

I have source S3 files whose paths are of the form <engine>_<yyyymmdd>/<tablename>.csv. I want to take all files named mytable.csv and create a dataset whose fields are those in the files - PLUS the fields "engine" and "date", with values for each record derived from that record's source file path. How can I accomplish this in DSS, with visual and/or code elements as needed? (Partioning seems to do a lot of what I want, but I can't find how to turn partitions into dataset fields.)

Operating system used: Linux

Alexandru · February 2022

Hi @MarkPundurs
,

You can turn partition in a field in the dataset using the processor below see (2).

1) You can also use "Files from folder" dataset and filter the files you want to include.

Screenshot 2022-02-25 at 08.43.43.png

Screenshot 2022-02-25 at 08.44.21.png

2) You can use "Enrich records with files info" in prepare recipe to file path of the output the prepare recipe will create. Screenshot 2022-02-25 at 08.40.56.png

Let me know if this would work for you.

Add S3 path name parts as columns in dataset?

Best Answer

Categories

Setup Info

Tags