Scenario Custom Trigger Tips
I have been working on writing a custom Python trigger script that enables flexible time-based scheduling. For example, running a scenario at multiple times during business hours on weekdays and then on a different schedule on the weekend.
One could do some versions of such a schedule with a bunch of built-in time-based triggers. In my case it would have required close to 500 triggers so wasn't feasible.
In the process of writing this, I learned a few things about custom Python triggers. Thought I would share a few notes in the reply below.
Best Answer
-
Marlan Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Dataiku Frontrunner Awards 2021 Participant, Neuron 2023 Posts: 320 Neuron
- The backend.log file (read via Admin > Maintenance) is the place to look for trigger logging (Clement pointed me to this).
- Triggers have two methods besides "fire()", "get_trigger()" and "get_trigger_state()"
In my testing, get_trigger() returned the following dict (which matches pretty clearly to the trigger parameters specified in the UI):
{u'name': u'Trigger Name',
u'graceDelaySettings': {u'delay': 0, u'checkAgainAfterGraceDelay': False},
u'delay': 30,
u'params': {u'code': 'your script here', u'envSelection': {u'envMode': u'INHERIT'}},
u'active': True,
u'type': u'custom_python',
u'id': u'Ju7jasV7'}and get_trigger_state() returned None.
- The "Run every (seconds)" parameter (same as the "delay" item in the above dict) specifies a timer that starts after the trigger script completes. In other words, however long the trigger script takes isn't counted as part of the time (particuarly relevant for me as my script sometimes has a delay before firing).
- You can pass a dict of values in the "fire()" method that you can retrieve in a Scenario step via Scenario().get_trigger_params(). This might be helpful if you want a step to have information about the triggering event. For example, trigger_handle.fire(params={'key1':'val1', 'key2':'val2').
Answers
-
Awesome! Thanks for sharing your knowledge, Marlan!
-
Thanks Marlan for these very valuable information !
Please, do you know the use of the 'Grace Delay' ? If I set 10 seconds and true for the CheckAgainAfterGraceDelay parameter, how does the trigger work ?
-
Marlan Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Dataiku Frontrunner Awards 2021 Participant, Neuron 2023 Posts: 320 Neuron
Hi @rona
,I asked about this a while back and below is what I received from Matthieu Scordia from Dataiku. Still not sure I completely understand all the combinations but nonetheless thought it was helpful.
Marlan
Examples:
run every 900s
grace delay 120s
recheck on
you would have
t-0 : trigger runs, no change detected
t-900 : trigger runs, no change detected
t-1257 : dataset is changed (rebuilt, or files changed by external source for external datasets)
t-1800 : trigger runs, detects changes, prepares grace delay sequence
t-1920 : trigger runs again (because of recheck on), no change detected, launches scenario run
t-3000 : scenario done
t-3630 : trigger runs, no change detected
....
The "recheck" option controls whether DSS runs the trigger again at t-1920. The goal is to wait for the dataset to "stabilize". Imagine a dataset that is updated over the course of 30s (because of many files, or big files, or slow network, or whatever...), and you have a grace delay of 10s:
t-0 : trigger runs, no change detected
t-900 : trigger runs, no change detected
t-1797 : dataset starts changing
t-1800 : trigger runs, detects changes, prepares grace delay sequence
t-1810 : trigger runs again (because of recheck on), detects more changes, resets grace delay sequence
t-1820 : trigger runs again (because of recheck on), detects more changes, resets grace delay sequence
t-1827 : dataset stops changing
t-1830 : trigger runs again (because of recheck on), no change detected, launches scenario run
t-3000 : scenario done
t-3720 : trigger runs, no change detected
....
The grace delay is meant to aggregate triggers if they arrive in bulk, for example if you have a second trigger B on the same dataset, with grace delay 100:
t-0 : trigger A runs, no change detected
t-900 : trigger A runs, no change detected
t-1257 : dataset is changed (rebuilt, or files changed by external source for external datasets)
t-1800 : trigger A runs, detects changes, prepares grace delay sequence
t-1870 : trigger B runs, detects changes, prepares grace delay sequence => grace delay of trigger A is dropped (because it ends before the one of trigger
t-1970 : trigger B runs again (because of recheck on), no change detected, launches scenario run
t-3000 : scenario done
t-3720 : trigger A runs, no change detected
-
importthepandas Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS Core Concepts, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 115 Neuron
it would be quite swell if we could back into both the project key and scenario ID with get_trigger() without looping over every scenario, finding runs where trigger IDs are used to check last status in a custom trigger routine.
-
Is it possible to edit the 'Repeat every' setting in a custom Python trigger for customers?