Do you know the History of Data Science? READ MORE

How to diagnose a failing SQL Probe metric (Dataiku 7.01)

Keufran
Level 2
How to diagnose a failing SQL Probe metric (Dataiku 7.01)

Hello World,

We fail to compute a metric (as a SQL Probe) on a dataset. The request works great in a query editor but fails when used as a probe. How can we diagnose ?

We of course double-checked syntax, table paths notation, etc...

0 Kudos
8 Replies
CoreyS
Community Manager
Community Manager

Hi, @Keufran! Can you provide any further details on the thread to assist users in helping you find a solution. Also, can you let us know if you’ve tried any fixes already?This should lead to a quicker response from the community.

Looking for more resources to help you use DSS effectively and upskill your knowledge? Check out these great resources: Dataiku Academy | Documentation | Knowledge Base

A reply answered your question? Mark as ‘Accepted Solution’ to help others like you!
0 Kudos
Keufran
Level 2
Author

Thank you @CoreyS for your remarks.

Infortunatly, I don't see what to add. MA request is failing as a SQL Probe but working in a query editor. DSS gives no explicit error, thus how can I diagnose ?

0 Kudos
CoreyS
Community Manager
Community Manager

Thanks @Keufran hopefully this will at least give some more added visibility to help find you a solution.

Looking for more resources to help you use DSS effectively and upskill your knowledge? Check out these great resources: Dataiku Academy | Documentation | Knowledge Base

A reply answered your question? Mark as ‘Accepted Solution’ to help others like you!
0 Kudos
Marlan
Neuron
Neuron

Hi @Keufran,

Have you tried the Click to run this now / Run on the Metrics Edit page below the SQL Probe? What is the result of that?

And can you say more about what you mean by fail? Do you see an error message somewhere or the result doesn't update or it's incorrect?

Marlan

Keufran
Level 2
Author

Hi @Marlan,

Thanks for your interest.

When I "run this now" the compute dialog opens, no error is shown and when I click on "Last run results" only the computation time is displayed, not the metric.

Here is the request:

SELECT
A.nb_cmd[0]/(A.nb_cmd[0]+A.nb_cmd[1])*100 AS percent_training
FROM (
SELECT
COLLECT_LIST(B.nb_cmd) AS nb_cmd
FROM (
SELECT
is_placebo,
COUNT(DISTINCT ref_externe) AS nb_cmd
FROM
${hiveconf:database_socle}.${projectKey}_commandes_ftth_historique_a_jour
GROUP BY
is_placebo
ORDER BY
is_placebo DESC
) B
) A;

The dataset is not purly managed by DSS. It's HDFS.

With a pySpark script, all is fine

def process(dataset, partition_id):
   # dataset is a dataiku.Dataset object
  bdd = dataiku.get_custom_variables()["database_socle"]
  project = dataiku.get_custom_variables()["projectKey"]
  query = f"SELECT A.nb_cmd[0]/(A.nb_cmd[0]+A.nb_cmd[1])*100 AS     percent_training FROM (SELECT COLLECT_LIST(B.nb_cmd) AS nb_cmd FROM (SELECT is_placebo,COUNT(DISTINCT ref_externe) AS nb_cmd FROM {bdd}.{project}_commandes_ftth_historique_a_jour GROUP BY is_placebo ORDER BY is_placebo DESC) B) A"
   dataset = dataiku.Dataset("commandes_ftth_historique_a_jour")
   executor = dataiku.HiveExecutor(dataset=dataset)
  resultdf = executor.query_to_df(query)
  res = resultdf.iloc[0].values[0]
  return {"placebo": res}

 

0 Kudos
Marlan
Neuron
Neuron

Hi @Keufran,

It's odd that you don't get an error message as I'd expect one given that you are using a couple of variables in your SQL script. Unfortunately, variables besides those listed in the sample code  (${DKU_DATASET_TABLE_NAME}, ${DKU_PARTITION_FILTER}, etc.) are not available in SQL Probes.

I have submitted an idea to enable all project variables in SQL Probes. The status is Planned so I'd think it'd be coming within the next few releases.

After clicking Run this now / Run the bar below the probe definition turns green? I'd expect it to turn red and display an error message. 

In any case, try replacing the variables you use with their values and see if it works then. If so, you'll need to decide if you are OK hard coding those values or not. I was not and my workaround was to create an extra recipe and dataset in my flow to do part of the calculation and then apply a SQL Probe to that. Not a fan of cluttering the flow with unneeded stuff but for me was better than hard coding and having that cause problems later.

Marlan

 

 

0 Kudos
Keufran
Level 2
Author

Thanks again Marlan for your interest.

Results are the same if we use the variables or not.

After clicking "Run" the status bar is green (RGB(215,238,205) to be precise as I'm colour-blind 🙂 ). But "Last run results" display only the duration, not the metric itself.

The Dataset contains more than 15M rows, so duplicating it (well, almost) to do calculations is not an option for now. I prefer to use the PySpark to bypass the SQL Probe problem.

We will soon upgrade to Dataiku 9, I hope the problem will vanish...

0 Kudos
Marlan
Neuron
Neuron

Yes, that is green -  very odd that it appears to run successfully with variables (not my experience on v9 ). OK, well I'm out of ideas. Sorry I couldn't help.

Marlan

0 Kudos
A banner prompting to get Dataiku DSS