Trigger on new file addition on S3 path

Options
sj0071992
sj0071992 Partner, Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Dataiku DSS Developer, Neuron 2022, Neuron 2023 Posts: 131 Neuron

Hi Team,

Is there any way to define a trigger like whenever a new file is added in my S3 path then my workflow should run for that particular file instead of all the files in that specified path?

Thanks in Advance

Answers

  • Alexandru
    Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,209 Dataiker
    Options

    @sj0071992

    It would help to know a bit more about your use case here.

    At what frequency and how many new files do you expect? Daily? Hourly?

    Are files timestamped, or put in a path that could perhaps be used as a partition? Can old files be updated after they were initially created?

    There is a scenario trigger for Dataset modified trigger :

    Screenshot 2021-10-21 at 21.02.30.png

    This performs an S3 enumeration and will detect if there were changes since the last enumeration based on a calculated hash. However, the exact file names that change are not available to the scenario. So you would need some logic for example to store a project variable with the last processed file timestamp. Depending on the structure of the files and if perhaps partitioning could be used e.g hourly partition and only build the last hour every time.

Setup Info
    Tags
      Help me…