Trigger Scenarios from Marker file

WH
WH Registered Posts: 17 ✭✭✭✭

Docs mention that:

Optionally, for filesystem-like datasets, it is possible to specify a file name as a “marker” file whose changing is understood as “the data has changed”. When a marker file is specified, changing the other files of the data doesn’t activate the dataset modification trigger. This makes it possible to prevent the trigger from activating while the dataset files are being modified, and protects against situation where refreshing of a dataset can hang.

How do I specify this marker file?

My dataset is a Box folder dataset; I can already run scenarios using dataset modification trigger, however I would now like to specify a marker file.

Thanks

Best Answer

  • fchataigner2
    fchataigner2 Dataiker Posts: 355 Dataiker
    Answer ✓

    Hi,

    the marker file (a.k.a. the "change tracking" field in the dataset's Settings > Advanced tab) isn't available on custom FS providers like the one from the Box plugin.

    If the marker file is only meant to be used to trigger scenario, you can probably make a dataset of this single file, and use it in the "trigger on dataset change" instead of the dataset holding the data.

Answers

  • Sajid
    Sajid Partner, Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 17 Partner

    Hi @ocean_rhythm
    @fchataigner2
    ,

    I have a managed folder with s3 connection. I want to trigger a scenario if any modification takes place in a particular folder. I have added that folder name in the marker file. What should I do next? While setting a scenario how can I set the marker file, do I have to do this using a custom scenario?

  • fchataigner2
    fchataigner2 Dataiker Posts: 355 Dataiker

    for a folder backed by S3 files, you don't need a marker file. Simply add that folder in the "trigger on dataset change". DSS will react if the contents of the folder change, where change is one of addition/removal of file, change in size of a file, change to last modification time of the file.

  • Sajid
    Sajid Partner, Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 17 Partner

    Thanks @fchataigner2
    ,

    It works. I want to get the information of latest modification. For instance, file ABC has been added to my folder, and now I am going to process that using its path and other details. Where can I get such details.

    I was able to get the path of all the contents in folder by list_paths_in_partitions() method in python. This listed many paths even of the inside folders, but I was able to get the desired thing with some logics. I just want to know if there is any direct method to get that. Like ABC is the latest added or modified file in my folder or say 4 files have been added to the folder and I want to process those 4 programmatically or using dataiku functionalities.

    Thanks

  • fchataigner2
    fchataigner2 Dataiker Posts: 355 Dataiker

    you can get the last modification date with https://doc.dataiku.com/dss/latest/python-api/managed_folders.html#dataiku.Folder.get_path_details . For finding which files exist, there is no alternative to list_paths_in_partition()

  • Sajid
    Sajid Partner, Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 17 Partner
Setup Info
    Tags
      Help me…