Ready for Dataiku 10? Try out the Crash Course on new features!GET STARTED

Trigger on new file addition on S3 path

sj0071992
Level 2
Level 2
Trigger on new file addition on S3 path

Hi Team,

 

Is there any way to define a trigger like whenever a new file is added in my S3 path then my workflow should run for that particular file instead of all the files in that specified path?

 

Thanks in Advance

0 Kudos
1 Reply
AlexT
Dataiker
Dataiker

@sj0071992 

It would help to know a bit more about your use case here.

At what frequency and how many new files do you expect? Daily? Hourly?

Are files timestamped, or put in a path that could perhaps be used as a partition? Can old files be updated after they were initially created? 

There is a scenario trigger for Dataset modified trigger :

 

Screenshot 2021-10-21 at 21.02.30.png

This performs an S3 enumeration and will detect if there were changes since the last enumeration based on a calculated hash. However, the exact file names that change are not available to the scenario. So you would need some logic for example to store a project variable with the last processed file timestamp. Depending on the structure of the files and if perhaps partitioning could be used e.g hourly partition and only build the last hour every time.

 

 

0 Kudos
A banner prompting to get Dataiku DSS