Scenario Trigger on SQL Query change

jrjosias2 · April 2023

Hello,

I'm new to DSS and I'm struggling in order to get the executions from the Trigger in order to capture the change between the first run to the dataset? Is that possible ? Can someone says how ? The documentation is very poor (Launching a scenario — Dataiku DSS 11 documentation) if someone can share some additional documentation that can actually say something detailed about this feature, I'll be very thankful.

JordanB · April 2023

Hi @jrjosias2
,

The SQL trigger will fire any time the result of the SQL query is different from the previous run of that same query. For a SQL dataset, the dataset change trigger will only look at the definition. So, the data involved in that query does not really need to have changed, it can fire, for example, when using queries that return rows in a different order. To counteract that we recommend using queries that return an aggregation such as COUNT or MAX such as "select count(*) from the_table" query (or any query for which you know the result changes).

I hope this helps!

Thanks,

Jordan

jrjosias2 · April 2023

Hi @JordanB
thanks for your reply,

That part I'm aware of, and I've realized that by constantly changing the count of records on the source database, however, I'm not sure how to capture that value change from past/current trigger execution, or at least how to find it in any log.

So to simplify, my question is: Can I capture the value from the Trigger execution? If yes, you'd know how or know some piece of documentation that describes it? If not possible to capture, can I at least keep track of it through some trigger execution log?

Because from the Last runs logs, I can only see the steps from the scenario

Kind regards,

Josias

awallis · July 2023

If the dataset you're using to trigger is part of your flow...

1. Create a very small subtable using the "SQL" node and with logic like...

SELECT <FIELDS OR VALUES THAT YOU WANT TO REPORT ON / THAT CAUSED THE TRIGGER> 
FROM "<DSS_TABLE>"
HAVING <CONDITION> = (SELECT MAX(<CONDITION>) FROM <"DSS_TABLE">)

The "condition" piece here is that your trigger table may be large and contain irrelevant information. You can use that condition space to only grab the interesting subset of data that you want to report on.

2. As the very first step of your scenario, build that table.

3. As the second step of your scenario, add a python code step that takes those tables' values and inserts them into project variables:

import dataiku

client = dataiku.api_client()
project = dataiku.api_client().get_project(dataiku.default_project_key())

variables = project.get_variables()

#load in dataset containing facts
facts = dataiku.Dataset("<TABLE>")
facts_df = keys.get_dataframe()

#update vars in py array
#the variables do not need to be initialized.  just add them.
variables['standard']['<FIELDNAME>']= max(facts_df.<FIELDNAME>) #aggregation not required, but it is helpful to ensure that only a single value lands in the project variable
variables['standard']['<FIELDNAME2>']= max(facts_df.<FIELDNAME2>)

#load vars to project
project.set_variables(variables)

4. Now, in your trigger (and elsewhere in the project) you can reference those values by using ${FIELDNAME}. You can use those values in a post-run reporter, even! Even if your job fails (or especially if) the reporter can output potentially useful information.

Turribeach · July 2023

Awallis solution it's valid if you like to do it in Python but there are other ways too. You could use the results of a SQL step, retrieve the value of a metric or retrieve the value of a check, all of which are documented here:

https://doc.dataiku.com/dss/latest/scenarios/variables.html#using-the-results-of-a-previous-sql-step

This value can then be stored in a scenario variable which can the used in other steps or an email reporter.

Scenario Trigger on SQL Query change

Answers

Categories

Setup Info

Tags