Trigger Scenarios from Marker file

Solved!
ocean_rhythm
Level 3
Trigger Scenarios from Marker file

Docs mention that:

Optionally, for filesystem-like datasets, it is possible to specify a file name as a โ€œmarkerโ€ file whose changing is understood as โ€œthe data has changedโ€. When a marker file is specified, changing the other files of the data doesnโ€™t activate the dataset modification trigger. This makes it possible to prevent the trigger from activating while the dataset files are being modified, and protects against situation where refreshing of a dataset can hang.

How do I specify this marker file?

My dataset is a Box folder dataset; I can already run scenarios using dataset modification trigger, however I would now like to specify a marker file.

Thanks

0 Kudos
1 Solution
fchataigner2
Dataiker

Hi,

the marker file (a.k.a. the "change tracking" field in the dataset's Settings > Advanced tab) isn't available on custom FS providers like the one from the Box plugin.

If the marker file is only meant to be used to trigger scenario, you can probably make a dataset of this single file, and use it in the "trigger on dataset change" instead of the dataset holding the data.

View solution in original post

6 Replies
fchataigner2
Dataiker

Hi,

the marker file (a.k.a. the "change tracking" field in the dataset's Settings > Advanced tab) isn't available on custom FS providers like the one from the Box plugin.

If the marker file is only meant to be used to trigger scenario, you can probably make a dataset of this single file, and use it in the "trigger on dataset change" instead of the dataset holding the data.

Sajid_Khan
Level 3

Hi @ocean_rhythm  @fchataigner2 ,

I have a managed folder with s3 connection. I want to trigger a scenario if any modification takes place in a particular folder. I have added that folder name in the marker file. What should I do next? While setting a scenario how can I set the marker file, do I have to do this using a custom scenario?

0 Kudos
fchataigner2
Dataiker

for a folder backed by S3 files, you don't need a marker file. Simply add that folder in  the "trigger on dataset change". DSS will react if the contents of the folder change, where change is one of addition/removal of file, change in size of a file, change to last modification time of the file.

Sajid_Khan
Level 3

Thanks @fchataigner2 ,

It works. I want to get the information of latest modification. For instance, file ABC has been added to my folder, and now I am going to process that using its path and other details. Where can I get such details.

I was able to get the path of all the contents in folder by list_paths_in_partitions() method in python. This listed many paths even of the inside folders, but I was able to get the desired thing with some logics. I just want to know if there is any direct method to get that. Like ABC is the latest added or modified file in my folder or say 4 files have been added to the folder and I want to process those 4 programmatically or using dataiku functionalities.

Thanks

 

0 Kudos
fchataigner2
Dataiker

you can get the last modification date with https://doc.dataiku.com/dss/latest/python-api/managed_folders.html#dataiku.Folder.get_path_details . For finding which files exist, there is no alternative to list_paths_in_partition()

0 Kudos
Sajid_Khan
Level 3

Thank You @fchataigner2 

0 Kudos