Scenario triggered by an external change

davidmakovoz · October 2020

I am trying to create a scenario that is triggered by an external change to a dataset.

Specifically, I have dataset created from a Redshift table. And I have a recipe that processes this dataset.

I want to be able to trigger the recipe every time a change is made to the Redshift table. Any change to that table will be done 'outside' of Dataiku. It's some type of reporting table and new rows are added at some random times.

I set up a trigger that monitors the dataset changes, but it doesn't work. I remove and add rows to my table by running Redshift queries outside of Dataiku, and nothing happens.

I was thinking of using the "SQL query change" trigger and monitor the number of rows in the table. What is the best way to keep track of the number of rows, i.e. the change? Should I use an environmental variable of some sort? What is the best practice?

Thank you

Jediv · February 2021

Hi,

If I understand your goal to be:

I want my scenario to run when rows are added or removed from a sql dataset.

You can just write the following query in the sql query trigger:

Select count(*) from my_table;

The sql query trigger periodically runs that sql query and keeps track of the last result. If the new run has a result that doesn’t match the stored one, then the trigger fires and the scenario runs.

davidmakovoz · October 2020

Are there any examples of "SQL query change" triggers in the documentation or anywhere else on the Dataiku website?

Ignacio_Toledo · October 2020

Hi @davidmakovoz
. I was able to found this link with google: https://gist.github.com/alexcombessie/64e5508d87e47685f9b5955cb18a9ee2

It has a "coded" trigger that works on SQL datasets changes. It might help.

From the documentation or academy, I didn't find any explicit examples.

ben_p · October 2020

Here is an example SQL query trigger from one of our projects, this trigger looks a a table with a new date prefix, which is created every morning by other job:

SELECT 
max(cast(parse_date('%Y%m%d', 
substr(table_id,13)) as timestamp)) as max_date
FROM `dataset.__TABLES__` 
where table_id like '%ga_sessions_%'
and table_id not like '%intraday%'
and row_count > 50000

Does this help?

Ben

davidmakovoz · February 2021

Ben,

Thank you for the example. I have a conceptual question. You have a query that returns a value, a date. How does it trigger the scenario? You'd think a trigger should involve a condition, e.g. a change in a dataset, time passed, the custom python trigger has an explicit if statement that runs t.fire().

Where is a condition here? Your query returns a date or comes back empty if row_count < 50000 (btw where does row_count come from?). How does it trigger the steps?

Thank you

davidmakovoz · February 2021

@Jediv
thank you for the reply.

I have two triggers: a SQL query change trigger and a python Custom trigger.

They both check the same table dataiku_poc.CMR_CAMP_COPY

Here is what I have in the SQL query change trigger

select count(*) from dataiku_poc.CMR_CAMP_COPY;

Here is what I have in the python Custom Trigger

import dataiku
from dataiku import pandasutils as pdu
import pandas as pd
from dataiku.scenario import Trigger
mydataset = dataiku.Dataset("CMR_CAMP_COPY")
mydataset_df = mydataset.get_dataframe()
p = dataiku.Project() 
variables = p.get_variables() 
CMR_count = int(variables["local"]["CMR_count"])
t = Trigger()
new_count = len(mydataset_df)
if new_count != CMR_count: 
    variables["local"]["CMR_count"] = new_count 
    p.set_variables(variables)    
    t.fire()

Both triggers have Run every 10 seconds, Grace period 0 seconds.

Every time I delete or insert rows in the table the python Custom trigger is triggered and the SQL never triggers.

What I am doing wrong? Is there a way to troubleshoot it?

Thank you

Jediv · February 2021

Hi David,

I can confirm that the SQL query should work as I described. To debug your use case I would suggest the following:

1. Make sure that auto-triggers are turned on for the scenario.

2. Ensure no other triggers are running on the same scenario. If other triggers are running the SQL Query trigger maybe suppressed from running.

3. Check that the query returns the expected count from a SQL notebook in the same project.

4. Search for the following in the DATA_DIR/run/backend.log where you replace YOUR_PROJECT with the name of the project where your trigger is running:

trigger:YOUR_PROJECT.sql_query_change

Scenario triggered by an external change

Best Answer

Answers

Categories

Setup Info

Tags