I am trying to create a scenario that is triggered by an external change to a dataset.
Specifically, I have dataset created from a Redshift table. And I have a recipe that processes this dataset.
I want to be able to trigger the recipe every time a change is made to the Redshift table. Any change to that table will be done 'outside' of Dataiku. It's some type of reporting table and new rows are added at some random times.
I set up a trigger that monitors the dataset changes, but it doesn't work. I remove and add rows to my table by running Redshift queries outside of Dataiku, and nothing happens.
I was thinking of using the "SQL query change" trigger and monitor the number of rows in the table. What is the best way to keep track of the number of rows, i.e. the change? Should I use an environmental variable of some sort? What is the best practice?
Hi @davidmakovoz. I was able to found this link with google: https://gist.github.com/alexcombessie/64e5508d87e47685f9b5955cb18a9ee2
It has a "coded" trigger that works on SQL datasets changes. It might help.
From the documentation or academy, I didn't find any explicit examples.
Here is an example SQL query trigger from one of our projects, this trigger looks a a table with a new date prefix, which is created every morning by other job:
substr(table_id,13)) as timestamp)) as max_date
where table_id like '%ga_sessions_%'
and table_id not like '%intraday%'
and row_count > 50000
Does this help?