Scenario - Defining a variable using a computed metric
Hi, I am having trouble with the right syntax to extract a 'computed metric' from a dataset.
I found a great post that successfully got me to report (output to reporter) a variable with an arbitrary value.
Now I'm trying to extract a 'computed metric', "records:COUNT_RECORDS" from a dataset. Dataiku documents this but I can't successfully retrieve the value.
https://doc.dataiku.com/dss/latest/scenarios/variables.html
I modified the syntax for my case
t${filter(parseJson(stepOutput_the_metrics)[‘projID.computed’].computed, x, x.metricId == ‘records:COUNT_RECORDS’)[0].value}
Where projID I got from theURL of my project.
I get the following error when I run the scenario.
java.lang.IllegalArgumentException
Incorrect formula: '${filter(parseJson(stepOutput_the_metrics)[‘projID.computed’].computed, x, x.metricId == ‘records:COUNT_RECORDS’)[0].value}' : Missing number, string, identifier, regex, or parenthesized expression(Parsing error at offset 0), caused by: ParsingException: Missing number, string, identifier, regex, or parenthesized expression(Parsing error at offset 0)
1) How do I correct the formula?
2) In this case, I am only running the computed metrics for one dataset but if this works I would want computed metrics from various data sets. How would I modify the syntax so I can retrieve the same metrics from different datasets?
Thank you for your time!
Best Answer
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 1,987 Neuron
Hi, rthere are few things that you may be doing wrong. Be aware both project ID and dataset ID are case sensitive. I covered more in detail on how to build the formula to extract a metric on this other post:
If you scroll down you will see how to extract the whole JSON and then build the extraction expression step by step.
Answers
-
Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,215 Dataiker
Hi,
You can only retrieve the metrics from the dataset computed in the previous step.
If you computed multiple datasets in that step you need to change the "CaMoYxZE_NP" with the respective dataset ID.toNumber(filter(parseJson(stepOutput_the_metrics)['SECFILLINGS.CaMoYxZE_NP']['computed'], x, x["metricId"]=="basic:COUNT_RECORDS")[0].value)
If you want to retrieve it from other dataset where metrics were not computed in this particular scenario for example you can use : https://community.dataiku.com/t5/Using-Dataiku/Compute-Metrics-using-Python-API/m-p/24763
Thanks,
Thanks, -
Thank you @AlexT
I tried what you recommended and I am getting the following error.
java.lang.Exception
parseJson failed: Missing value at 0 [character 1 line 1]I made the following changes from your syntax so it can apply to my case:
'SECFILLINGS.CaMoYxZE_NP' changed to 'projectID.datasetname_NP'
Project ID I get from the URL.
Datasetname, I could not locate a "dataset ID" so I've been using the dataset name. That is also what I see used in the scenario logs.
I noticed your syntax for the metric is using
basic:COUNT_RECORDS
and I had been using
records:COUNT_RECORDS
I tried both and get the same error.
Also, I have the "Evaluate Variable" toggle switch to ON.
Anything else you recommend I try?
Thank you.
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 1,987 Neuron
I believe the right formula for non-partitioned datasets will be:
toNumber(filter(parseJson(stepOutput_REPLACE_WITH_COMPUTE_METRICS_STEP_NAME_WITH_NO_SPACES)['REPLACE_WITH_PROJECT_ID.REPLACE_WITHDATASET_ID_NP']['computed'], x, x["metricId"]=="records:COUNT_RECORDS")[0].value)
-
It finally worked and it turns out I was doing 2 things wrong.
My final equation...
toNumber(filter(parseJson(stepOutput_Compute_Metrics)['ProjID.DatasetID_NP']['computed'], x, x["metricId"]=="records:COUNT_RECORDS")[0].value)
@AlexT
mentioned I need to add the dataset ID. So I looked at the logs and dataset ID is the name of the dataset. In my case it was simply 'test'. 'ProjID.DatasetID_NP'. Thank you Alex!@Turribeach
you kept stated to add the compute metrics step "stepOuput_REPLACE_WITH_COMPUTE_METRICS_STEP_NAME_WITH_NO_SPACES" and I did the mistake of leaving the default step name "Step #2". When I looked at the logs, the step name was "the_metrics". I tried variations of the two with no sucess... then it occurred to me to rename the Computer Metrics steps from the default to a name with no spaces. So I renamed it "Compute_Metrics". I updated the variable formula in "Define variable" step and SUCCESS!!!I am now able to send that scenario variable over to Reporter.
Thank you!!
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 1,987 Neuron
Glad you got there in the end. Certainly this is an area Dataiku should improve as suggested by this idea. On the positive side now you have the tools and the knowledge to use this feature going forward.