Scenario Custom Trigger Tips

Options
Marlan
Marlan Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Dataiku Frontrunner Awards 2021 Participant, Neuron 2023 Posts: 317 Neuron

I have been working on writing a custom Python trigger script that enables flexible time-based scheduling. For example, running a scenario at multiple times during business hours on weekdays and then on a different schedule on the weekend.

One could do some versions of such a schedule with a bunch of built-in time-based triggers. In my case it would have required close to 500 triggers so wasn't feasible.

In the process of writing this, I learned a few things about custom Python triggers. Thought I would share a few notes in the reply below.

Tagged:

Best Answer

  • Marlan
    Marlan Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Dataiku Frontrunner Awards 2021 Participant, Neuron 2023 Posts: 317 Neuron
    Answer ✓
    Options
    • The backend.log file (read via Admin > Maintenance) is the place to look for trigger logging (Clement pointed me to this).
    • Triggers have two methods besides "fire()", "get_trigger()" and "get_trigger_state()"

    In my testing, get_trigger() returned the following dict (which matches pretty clearly to the trigger parameters specified in the UI):

    {u'name': u'Trigger Name',
    u'graceDelaySettings': {u'delay': 0, u'checkAgainAfterGraceDelay': False},
    u'delay': 30,
    u'params': {u'code': 'your script here', u'envSelection': {u'envMode': u'INHERIT'}},
    u'active': True,
    u'type': u'custom_python',
    u'id': u'Ju7jasV7'}

    and get_trigger_state() returned None.

    • The "Run every (seconds)" parameter (same as the "delay" item in the above dict) specifies a timer that starts after the trigger script completes. In other words, however long the trigger script takes isn't counted as part of the time (particuarly relevant for me as my script sometimes has a delay before firing).
    • You can pass a dict of values in the "fire()" method that you can retrieve in a Scenario step via Scenario().get_trigger_params(). This might be helpful if you want a step to have information about the triggering event. For example, trigger_handle.fire(params={'key1':'val1', 'key2':'val2').

Answers

  • LisaB
    LisaB Dataiker, Alpha Tester Posts: 208 Dataiker
    Options

    Awesome! Thanks for sharing your knowledge, Marlan!

  • rona
    rona Registered Posts: 47 ✭✭✭✭✭
    Options

    Thanks Marlan for these very valuable information !

    Please, do you know the use of the 'Grace Delay' ? If I set 10 seconds and true for the CheckAgainAfterGraceDelay parameter, how does the trigger work ?

  • Marlan
    Marlan Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Dataiku Frontrunner Awards 2021 Participant, Neuron 2023 Posts: 317 Neuron
    Options

    Hi @rona
    ,

    I asked about this a while back and below is what I received from Matthieu Scordia from Dataiku. Still not sure I completely understand all the combinations but nonetheless thought it was helpful.

    Marlan

    Examples:

    run every 900s

    grace delay 120s

    recheck on

    you would have

    t-0 : trigger runs, no change detected

    t-900 : trigger runs, no change detected

    t-1257 : dataset is changed (rebuilt, or files changed by external source for external datasets)

    t-1800 : trigger runs, detects changes, prepares grace delay sequence

    t-1920 : trigger runs again (because of recheck on), no change detected, launches scenario run

    t-3000 : scenario done

    t-3630 : trigger runs, no change detected

    ....

    The "recheck" option controls whether DSS runs the trigger again at t-1920. The goal is to wait for the dataset to "stabilize". Imagine a dataset that is updated over the course of 30s (because of many files, or big files, or slow network, or whatever...), and you have a grace delay of 10s:

    t-0 : trigger runs, no change detected

    t-900 : trigger runs, no change detected

    t-1797 : dataset starts changing

    t-1800 : trigger runs, detects changes, prepares grace delay sequence

    t-1810 : trigger runs again (because of recheck on), detects more changes, resets grace delay sequence

    t-1820 : trigger runs again (because of recheck on), detects more changes, resets grace delay sequence

    t-1827 : dataset stops changing

    t-1830 : trigger runs again (because of recheck on), no change detected, launches scenario run

    t-3000 : scenario done

    t-3720 : trigger runs, no change detected

    ....

    The grace delay is meant to aggregate triggers if they arrive in bulk, for example if you have a second trigger B on the same dataset, with grace delay 100:

    t-0 : trigger A runs, no change detected

    t-900 : trigger A runs, no change detected

    t-1257 : dataset is changed (rebuilt, or files changed by external source for external datasets)

    t-1800 : trigger A runs, detects changes, prepares grace delay sequence

    t-1870 : trigger B runs, detects changes, prepares grace delay sequence => grace delay of trigger A is dropped (because it ends before the one of trigger B)

    t-1970 : trigger B runs again (because of recheck on), no change detected, launches scenario run

    t-3000 : scenario done

    t-3720 : trigger A runs, no change detected

  • importthepandas
    importthepandas Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS Core Concepts, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 115 Neuron
    Options

    it would be quite swell if we could back into both the project key and scenario ID with get_trigger() without looping over every scenario, finding runs where trigger IDs are used to check last status in a custom trigger routine.

  • abder_
    abder_ Dataiku DSS Core Designer, Registered Posts: 1 ✭✭✭
    Options

    Is it possible to edit the 'Repeat every' setting in a custom Python trigger for customers?

Setup Info
    Tags
      Help me…