Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Hello World,
We fail to compute a metric (as a SQL Probe) on a dataset. The request works great in a query editor but fails when used as a probe. How can we diagnose ?
We of course double-checked syntax, table paths notation, etc...
Hi, @Keufran! Can you provide any further details on the thread to assist users in helping you find a solution. Also, can you let us know if you’ve tried any fixes already?This should lead to a quicker response from the community.
Thank you @CoreyS for your remarks.
Infortunatly, I don't see what to add. MA request is failing as a SQL Probe but working in a query editor. DSS gives no explicit error, thus how can I diagnose ?
Thanks @Keufran hopefully this will at least give some more added visibility to help find you a solution.
Hi @Keufran,
Have you tried the Click to run this now / Run on the Metrics Edit page below the SQL Probe? What is the result of that?
And can you say more about what you mean by fail? Do you see an error message somewhere or the result doesn't update or it's incorrect?
Marlan
Hi @Marlan,
Thanks for your interest.
When I "run this now" the compute dialog opens, no error is shown and when I click on "Last run results" only the computation time is displayed, not the metric.
Here is the request:
SELECT
A.nb_cmd[0]/(A.nb_cmd[0]+A.nb_cmd[1])*100 AS percent_training
FROM (
SELECT
COLLECT_LIST(B.nb_cmd) AS nb_cmd
FROM (
SELECT
is_placebo,
COUNT(DISTINCT ref_externe) AS nb_cmd
FROM
${hiveconf:database_socle}.${projectKey}_commandes_ftth_historique_a_jour
GROUP BY
is_placebo
ORDER BY
is_placebo DESC
) B
) A;
The dataset is not purly managed by DSS. It's HDFS.
With a pySpark script, all is fine
def process(dataset, partition_id):
# dataset is a dataiku.Dataset object
bdd = dataiku.get_custom_variables()["database_socle"]
project = dataiku.get_custom_variables()["projectKey"]
query = f"SELECT A.nb_cmd[0]/(A.nb_cmd[0]+A.nb_cmd[1])*100 AS percent_training FROM (SELECT COLLECT_LIST(B.nb_cmd) AS nb_cmd FROM (SELECT is_placebo,COUNT(DISTINCT ref_externe) AS nb_cmd FROM {bdd}.{project}_commandes_ftth_historique_a_jour GROUP BY is_placebo ORDER BY is_placebo DESC) B) A"
dataset = dataiku.Dataset("commandes_ftth_historique_a_jour")
executor = dataiku.HiveExecutor(dataset=dataset)
resultdf = executor.query_to_df(query)
res = resultdf.iloc[0].values[0]
return {"placebo": res}
Hi @Keufran,
It's odd that you don't get an error message as I'd expect one given that you are using a couple of variables in your SQL script. Unfortunately, variables besides those listed in the sample code (${DKU_DATASET_TABLE_NAME}, ${DKU_PARTITION_FILTER}, etc.) are not available in SQL Probes.
I have submitted an idea to enable all project variables in SQL Probes. The status is Planned so I'd think it'd be coming within the next few releases.
After clicking Run this now / Run the bar below the probe definition turns green? I'd expect it to turn red and display an error message.
In any case, try replacing the variables you use with their values and see if it works then. If so, you'll need to decide if you are OK hard coding those values or not. I was not and my workaround was to create an extra recipe and dataset in my flow to do part of the calculation and then apply a SQL Probe to that. Not a fan of cluttering the flow with unneeded stuff but for me was better than hard coding and having that cause problems later.
Marlan
Thanks again Marlan for your interest.
Results are the same if we use the variables or not.
After clicking "Run" the status bar is green (RGB(215,238,205) to be precise as I'm colour-blind 🙂 ). But "Last run results" display only the duration, not the metric itself.
The Dataset contains more than 15M rows, so duplicating it (well, almost) to do calculations is not an option for now. I prefer to use the PySpark to bypass the SQL Probe problem.
We will soon upgrade to Dataiku 9, I hope the problem will vanish...
Yes, that is green - very odd that it appears to run successfully with variables (not my experience on v9 ). OK, well I'm out of ideas. Sorry I couldn't help.
Marlan