Scenario triggered by an external change
I am trying to create a scenario that is triggered by an external change to a dataset.
Specifically, I have dataset created from a Redshift table. And I have a recipe that processes this dataset.
I want to be able to trigger the recipe every time a change is made to the Redshift table. Any change to that table will be done 'outside' of Dataiku. It's some type of reporting table and new rows are added at some random times.
I set up a trigger that monitors the dataset changes, but it doesn't work. I remove and add rows to my table by running Redshift queries outside of Dataiku, and nothing happens.
I was thinking of using the "SQL query change" trigger and monitor the number of rows in the table. What is the best way to keep track of the number of rows, i.e. the change? Should I use an environmental variable of some sort? What is the best practice?
Thank you
Best Answer
-
Hi,
If I understand your goal to be:
I want my scenario to run when rows are added or removed from a sql dataset.
You can just write the following query in the sql query trigger:
Select count(*) from my_table;
The sql query trigger periodically runs that sql query and keeps track of the last result. If the new run has a result that doesn’t match the stored one, then the trigger fires and the scenario runs.
Answers
-
davidmakovoz Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2022, Neuron 2023 Posts: 67 Neuron
Are there any examples of "SQL query change" triggers in the documentation or anywhere else on the Dataiku website?
-
Ignacio_Toledo Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 415 Neuron
Hi @davidmakovoz
. I was able to found this link with google: https://gist.github.com/alexcombessie/64e5508d87e47685f9b5955cb18a9ee2It has a "coded" trigger that works on SQL datasets changes. It might help.
From the documentation or academy, I didn't find any explicit examples.
-
ben_p Neuron 2020, Registered, Neuron 2021, Neuron 2022, Dataiku Frontrunner Awards 2021 Participant Posts: 143 ✭✭✭✭✭✭✭
Here is an example SQL query trigger from one of our projects, this trigger looks a a table with a new date prefix, which is created every morning by other job:
SELECT
max(cast(parse_date('%Y%m%d',
substr(table_id,13)) as timestamp)) as max_date
FROM `dataset.__TABLES__`
where table_id like '%ga_sessions_%'
and table_id not like '%intraday%'
and row_count > 50000Does this help?
Ben
-
davidmakovoz Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2022, Neuron 2023 Posts: 67 Neuron
Ben,
Thank you for the example. I have a conceptual question. You have a query that returns a value, a date. How does it trigger the scenario? You'd think a trigger should involve a condition, e.g. a change in a dataset, time passed, the custom python trigger has an explicit if statement that runs t.fire().
Where is a condition here? Your query returns a date or comes back empty if row_count < 50000 (btw where does row_count come from?). How does it trigger the steps?
Thank you
-
davidmakovoz Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2022, Neuron 2023 Posts: 67 Neuron
@Jediv
thank you for the reply.I have two triggers: a SQL query change trigger and a python Custom trigger.
They both check the same table dataiku_poc.CMR_CAMP_COPY
Here is what I have in the SQL query change trigger
select count(*) from dataiku_poc.CMR_CAMP_COPY;
Here is what I have in the python Custom Trigger
import dataiku from dataiku import pandasutils as pdu import pandas as pd from dataiku.scenario import Trigger mydataset = dataiku.Dataset("CMR_CAMP_COPY") mydataset_df = mydataset.get_dataframe() p = dataiku.Project() variables = p.get_variables() CMR_count = int(variables["local"]["CMR_count"]) t = Trigger() new_count = len(mydataset_df) if new_count != CMR_count: variables["local"]["CMR_count"] = new_count p.set_variables(variables) t.fire()
Both triggers have Run every 10 seconds, Grace period 0 seconds.
Every time I delete or insert rows in the table the python Custom trigger is triggered and the SQL never triggers.
What I am doing wrong? Is there a way to troubleshoot it?
Thank you
-
Hi David,
I can confirm that the SQL query should work as I described. To debug your use case I would suggest the following:
1. Make sure that auto-triggers are turned on for the scenario.
2. Ensure no other triggers are running on the same scenario. If other triggers are running the SQL Query trigger maybe suppressed from running.
3. Check that the query returns the expected count from a SQL notebook in the same project.
4. Search for the following in the DATA_DIR/run/backend.log where you replace YOUR_PROJECT with the name of the project where your trigger is running:
trigger:YOUR_PROJECT.sql_query_change