Check out the first Dataiku 8 Deep Dive focusing on Productivity on October 29th Read More

Scenario Custom Trigger Tips

Level 4
Scenario Custom Trigger Tips

I have been working on writing a custom Python trigger script that enables flexible time-based scheduling. For example, running a scenario at multiple times during business hours on weekdays and then on a different schedule on the weekend.

One could do some versions of such a schedule with a bunch of built-in time-based triggers. In my case it would have required close to 500 triggers so wasn't feasible.

In the process of writing this, I learned a few things about custom Python triggers. Thought I would share a few notes in the reply below.

4 Replies
Level 4
Author
  • The backend.log file (read via Admin > Maintenance) is the place to look for trigger logging (Clement pointed me to this).
  • Triggers have two methods besides "fire()", "get_trigger()" and "get_trigger_state()"

In my testing, get_trigger() returned the following dict (which matches pretty clearly to the trigger parameters specified in the UI):

{u'name': u'Trigger Name',
u'graceDelaySettings': {u'delay': 0, u'checkAgainAfterGraceDelay': False},
u'delay': 30,
u'params': {u'code': 'your script here', u'envSelection': {u'envMode': u'INHERIT'}},
u'active': True,
u'type': u'custom_python',
u'id': u'Ju7jasV7'}

and get_trigger_state() returned None.

  • The "Run every (seconds)" parameter (same as the "delay" item in the above dict) specifies a timer that starts after the trigger script completes. In other words, however long the trigger script takes isn't counted as part of the time (particuarly relevant for me as my script sometimes has a delay before firing).
  • You can pass a dict of values in the "fire()" method that you can retrieve in a Scenario step via Scenario().get_trigger_params(). This might be helpful if you want a step to have information about the triggering event. For example, trigger_handle.fire(params={'key1':'val1', 'key2':'val2').
Community Manager
Community Manager

Awesome! Thanks for sharing your knowledge, Marlan!

Lisa, Community Programs Manager at Dataiku

Don't forget to mark as "Accepted Solution" when someone provides the correct answer to your question 😉
0 Kudos
Level 3

Thanks Marlan for these very valuable information !

Please, do you know the use of the 'Grace Delay' ? If I set 10 seconds and true for the CheckAgainAfterGraceDelay parameter, how does the trigger work ?

 

0 Kudos
Level 4
Author

Hi @rona,

I asked about this a while back and below is what I received from Matthieu Scordia from Dataiku. Still not sure I completely understand all the combinations but nonetheless thought it was helpful.

Marlan 

 

Examples:

run every 900s

grace delay 120s

recheck on

 

you would have

 

t-0       :  trigger runs, no change detected

t-900   :  trigger runs, no change detected

t-1257 :  dataset is changed (rebuilt, or files changed by external source for external datasets)

t-1800 :  trigger runs, detects changes, prepares grace delay sequence

t-1920 :  trigger runs again (because of recheck on), no change detected, launches scenario run

t-3000 :  scenario done

t-3630 :  trigger runs, no change detected

....

 

The "recheck" option controls whether DSS runs the trigger again at t-1920. The goal is to wait for the dataset to "stabilize". Imagine a dataset that is updated over the course of 30s (because of many files, or big files, or slow network, or whatever...), and you have a grace delay of 10s:

 

t-0       :  trigger runs, no change detected

t-900   :  trigger runs, no change detected

t-1797 :  dataset starts changing

t-1800 :  trigger runs, detects changes, prepares grace delay sequence

t-1810 :  trigger runs again (because of recheck on), detects more changes, resets grace delay sequence

t-1820 :  trigger runs again (because of recheck on), detects more changes, resets grace delay sequence

t-1827 :  dataset stops changing

t-1830 :  trigger runs again (because of recheck on), no change detected, launches scenario run

t-3000 :  scenario done

t-3720 :  trigger runs, no change detected

....

 

The grace delay is meant to aggregate triggers if they arrive in bulk, for example if you have a second trigger B on the same dataset, with grace delay 100:

 

 

t-0       :  trigger A runs, no change detected

t-900   :  trigger A runs, no change detected

t-1257 :  dataset is changed (rebuilt, or files changed by external source for external datasets)

t-1800 :  trigger A runs, detects changes, prepares grace delay sequence

t-1870 :  trigger B runs, detects changes, prepares grace delay sequence => grace delay of trigger A is dropped (because it ends before the one of trigger B)

t-1970 :  trigger B runs again (because of recheck on), no change detected, launches scenario run

t-3000 :  scenario done

t-3720 :  trigger A runs, no change detected