Survey banner
The Dataiku Community is moving to a new home! We are temporary in read only mode: LEARN MORE

Dynamic folder creation -

Dinesh
Level 1
Dynamic folder creation -

Hi Team,

I am trying to fetch files from a established s3 connector and I have multiple sub-folders within that as the input for my inference step. 


Example :

year-2024/month-01/ day-06/ hour-08/ file1

year-2024/month-01/ day-06/ hour-08/ file2

year-2024/month-01/ day-06/ hour-09/ file6 (path will change over a time)

These are the example paths in S3. These folder creation will happen dynamically on the input subfolders. Got to know we can setup scenario based triggers for dataset change to get notified. So there are 2 questions in general.

1.) How do we run the model inference based on s3 folder - dataset change
2.) How can we create dynamic output folders in s3, similar to the input folder paths (which keeps changing periodically)

 

I have new file incoming every 5 mins. So inference should run as soon as I have the new batch file and the output should be stored in a separate folder dynamically

0 Kudos
1 Reply
Turribeach

I am not sure the dataset change trigger works for S3 buckets, you should test this. I believe it doesn't as S3 is not a regular file system, it's an object store, so Dataiku can't check easily for changes. If it doesn't you will have to have to run a scenario every 5 minutes or so to check for new files. You can use the list_paths_in_partition() method to list all objects in your folder. I wrote this post showing exactly how to make a scenario execute conditionally if you have some files to process. To create a dynamic folder output you simply write the output to the S3 folder using a dynamic name in your file name/path name like using YYYY-MM-DD etc. You will need to use a Python recipe for this.

0 Kudos