Hands-On: Custom Metrics, Checks & Scenarios

tgb417
tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,601 Neuron
edited July 16 in Academy Discussions

In this part of the hand-on exercise for Advanced Designer, the Automation module:

Hands-On: Custom Metrics, Checks & Scenarios

I was instructed to create a custom SQL step with the following code.

SELECT
    COUNT(*) AS "state_transactions",
    "merchant_state"
FROM "${projectKey}_transactions_joined_prepared"
WHERE "merchant_state" IS NOT NULL
GROUP BY "merchant_state"
ORDER BY "state_transactions" DESC
LIMIT 1

This consistently produced the error message.

ERROR: relation "DKU_TUT_AUTOMATION_transactions_joined_prepared" does not exist Position: 72

When looking at the Logs I discovered the following.

[2022/06/19-16:24:52.037] [FT-ScenarioThread-1u0skXZS-10233] [INFO] [dku.sql.runtime] scenario DKU_TUT_AUTOMATION.MYCUSTOMSENARIO#2022-06-19-16-24-51-997 - [ct: 16] Executing main statement: SELECT
    COUNT(*) AS "state_transactions",
    "merchant_state"
FROM "DKU_TUT_AUTOMATION_transactions_joined_prepared"
WHERE "merchant_state" IS NOT NULL
GROUP BY "merchant_state"
ORDER BY "state_transactions" DESC
LIMIT 1
[2022/06/19-16:24:52.039] [FT-ScenarioThread-1u0skXZS-10233] [DEBUG] [dku.resourceusage] scenario DKU_TUT_AUTOMATION.MYCUSTOMSENARIO#2022-06-19-16-24-51-997 - [ct: 18] Reporting start of CRU:{"type":"SQL_QUERY","id":"Cs5sNtsCuIspjaYz","startTime":1655670292039,"sqlQuery":{"connectionUsageId":"xKfFJ46cPK06UrAU","connection":"postgreSQL","query":"SELECT\n    COUNT(*) AS \"state_transactions\",\n    \"merchant_state\"\nFROM \"DKU_TUT_AUTOMATION_transactions_joined_prepared\"\nWHERE \"merchant_state\" IS NOT NULL\nGROUP BY \"merchant_state\"\nORDER BY \"state_transactions\" DESC\nLIMIT 1"}}
[2022/06/19-16:24:52.040] [FT-ScenarioThread-1u0skXZS-10233] [ERROR] [dku.sql.runtime] scenario DKU_TUT_AUTOMATION.MYCUSTOMSENARIO#2022-06-19-16-24-51-997 - Failed query
org.postgresql.util.PSQLException: ERROR: relation "DKU_TUT_AUTOMATION_transactions_joined_prepared" does not exist
  Position: 72
    at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2553)
    at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2285)
    at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:323)
    at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:473)
    at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:393)
    at org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:164)
    at org.postgresql.jdbc.PgPreparedStatement.execute(PgPreparedStatement.java:153)
    at com.dataiku.dip.queries.SQLQueryRuntime.executeAllStatementsUpToSelect(SQLQueryRuntime.java:363)
    at com.dataiku.dip.queries.SQLQueryRuntime.getPage(SQLQueryRuntime.java:558)
    at com.dataiku.dip.scheduler.steps.ExecuteSQLStepRunner.run(ExecuteSQLStepRunner.java:210)
    at com.dataiku.dip.server.services.ScenariosService.runStep(ScenariosService.java:655)
    at com.dataiku.dip.scheduler.scenarios.StepBasedScenarioRunner.run(StepBasedScenarioRunner.java:226)
    at com.dataiku.dip.scheduler.ScenarioThread.execute(ScenarioThread.java:146)
    at com.dataiku.dip.futures.FutureThreadBase.run(FutureThreadBase.java:96)
[2022/06/19-16:24:52.041] [FT-ScenarioThread-1u0skXZS-10233] [INFO] [dku.sql.runtime] scenario DKU_TUT_AUTOMATION.MYCUSTOMSENARIO#2022-06-19-16-24-51-997 - [ct: 20] Closing SQL Query runtime
[2022/06/19-16:24:52.041] [FT-ScenarioThread-1u0skXZS-10233] [INFO] [dku.sql.runtime] scenario DKU_TUT_AUTOMATION.MYCUSTOMSENARIO#2022-06-19-16-24-51-997 - [ct: 20] Closing main statement

I then took the query provided in the instructions and discovered it did not work in a SQL recipe by itself either . I was using one of my database connections that is working OK.

After some experimentation, I did discover by adding the SCHEMA for my connection manually. Things seems to work OK. In my case in the Connection the skema was listed as "dku". See below for the modification of the SQL.

SELECT COUNT(*) AS "state_transactions"
, "merchant_state"
FROM "dku"."${projectKey}_transactions_joined_prepared"
WHERE "merchant_state" IS NOT NULL
GROUP BY "merchant_state"
ORDER BY "state_transactions" DESC
LIMIT 1

Here are some configuration items.

  • DSS V10.0.7
  • PostrgreSQL V 13.2
  • Local OS is Mac OS 10.15.7

Am I missing something here?

Is my use of a Skema in a database connection unusual?

Is this a defect in 10.0.7? Most other places in DSS I do not run into this problem.

Hope this helps others.

Answers

  • tgb417
    tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,601 Neuron
    edited July 17

    Also,

    I discovered that the logic does not work out to correctly refresh the transactions_by_merchant_id to be Colorado Records. The instructions just above this point suggest that:

    When the scenario finishes, open the transactions_by_merchant_id dataset 
    to observe the change in output. It turns out that “Colorado” is the state
    with the most transactions.

    Although, it is true that Colorado is the states with the most transaction the table transactions_by_merchant_id dataset does not get updated. Because the Steps in the Scenario have already update the stored variable before this check. See below in orange.

    from dataiku.scenario import Scenario
    import dataiku
    
    # The Scenario object is the main handle from which you initiate steps
    scenario = Scenario()
    state_name = scenario.get_all_variables()["state_name"]
    if state_name != "Colorado":
        scenario.clear_dataset("transactions_by_merchant_id")
        scenario.build_dataset("transactions_by_merchant_id")

    if state_name != "Colorado": would never be true.

    One might try.

    from dataiku.scenario import Scenario
    import dataiku
    
    # The Scenario object is the main handle from which you initiate steps
    scenario = Scenario()
    state_name = scenario.get_all_variables()["state_name"]
    if state_name == "Colorado":
        scenario.clear_dataset("transactions_by_merchant_id")
        scenario.build_dataset("transactions_by_merchant_id")

    Then the statements in the Hands on instruction make some sense.

Setup Info
    Tags
      Help me…