Improved UX for Senario Variables Setup
User Story
As a Citizen Data Scientist just starting to advance with Scenarios, that typically uses “shaker” Dataiku formula steps, it would be helpful to have a more intuitive setup to get Data Set Metrics into a place that is usable by scenario steps.
Example of Roadblocks:
- Non-Partitioned datasets must have "_NP" added to what you think is the name of the dataset.
- Having to write a formula like this in order to get a row count into a variable is not easily intuited.
filter(parseJson(stepOutput_Compute_Metrics)['PROJECT_NAME.DATASET_NAME_NP'].computed, x, x["metricId"] == "records:COUNT_RECORDS")[0].value
- There is no obvious help, tutorial, training, knowledge base examples, code snip-its, to get a user through this particularly on non-partitioned datasets. The use examples are fairly limited, and focus on python code based examples.
- If a dataset in your flow that is used to create a metric is temporarily empty a variable based on a metric like a min or max date in a certain column will fail. (One can work around this with a SQL, based metric, and something like a coalesce to deal with the null values of an empty dataset.)
- Automatic building of a flow that has a dataset that is filtered by a variable is not correctly recognized by the recursive flow build algorithm. (One may be able to work around this issue by explicitly building each data set in the flow in it’s own scenario step.)
- There is no user interface to see all of the possible values that one can extract from the computed metrics. Not in tool tips or a dedicated interface. Logs don't help much either. Having to discover a formula like this to put these values in the project variables is also challenging to discover.
parseJson(stepOutput_Compute_Metrics)['PROJECT_NAME.DATASET_NAME_NP'].computed
- Here is another example of similar challenges with Variables in Scenarios
- The ability to set both Local and Global Project variables.
Comments
-
Marlan Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Dataiku Frontrunner Awards 2021 Participant, Neuron 2023 Posts: 320 Neuron
I wholeheartedly agree with this suggestion. Thanks for writing it up, Tom (@tgb417
).We had a similar need recently but rather than accessing metrics we wanted to access checks. I wrote a Python script to access the results of a Run Checks script. I posted it here asking if there was a better way thinking there had to be one. In fact using a formula would probably have been better but figuring out how to write it would have been no small task.
The use case of checking values of metrics (and checks) and taking appropriate action within a scenario seems like a reasonable and common one that should be easier.
Marlan
-
tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,598 Neuron
@Marlan
,Thanks for joining the conversation. It helps to understand that I’m not alone with this challenge. Will take a look at your script also.
Re-reviewing the academy the academy was helpful. But did not cover the formula language “shaker” approach to these questions that would be helpful to folks getting started.
https://academy.dataiku.com/path/advanced-designer/automation-course-1/675879 -
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,090 Neuron
Up voted as it is indeed an area that needs improvement. Here is a post showing a use case for Metric values being used in Scenario logic. Note the complexity in extracting the metric value. It's doable but certainly needs more of a "coder" than a "clicker"...
-
tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,598 Neuron
Thanks for joining this conversation, and thank you for your wonderful post.
-
Thank you for explaining. This saved me a lot of time @tgb417
.