Discover all of the brand-new features and improvements to existing capabilities in the Dataiku 11.3 updateLET'S GO

Improved UX for Senario Variables Setup

User Story

As a Citizen Data Scientist just starting to advance with Scenarios, that typically uses “shaker” Dataiku formula steps, it would be helpful to have a more intuitive setup to get Data Set Metrics into a place that is usable by scenario steps.

Example of Roadblocks:

  1. Non-Partitioned datasets must have "_NP" added to what you think is the name of the dataset.
  2. Having to write a formula like this in order to get a row count into a variable is not easily intuited.
    filter(parseJson(stepOutput_Compute_Metrics)['PROJECT_NAME.DATASET_NAME_NP'].computed, x, x["metricId"] == "records:COUNT_RECORDS")[0].value
  3. There is no obvious help, tutorial, training, knowledge base examples, code snip-its, to get a user through this particularly on non-partitioned datasets. The use examples are fairly limited, and focus on python code based examples.
  4. If a dataset in your flow that is used to create a metric is temporarily empty a variable based on a metric like a min or max date in a certain column will fail. (One can work around this with a SQL, based metric, and something like a coalesce to deal with the null values of an empty dataset.)
  5. Automatic building of a flow that has a dataset that is filtered by a variable is not correctly recognized by the recursive flow build algorithm. (One may be able to work around this issue by explicitly building each data set in the flow in it’s own scenario step.)
  6. There is no user interface to see all of the possible values that one can extract from the computed metrics. Not in tool tips or a dedicated interface. Logs don't help much either. Having to discover a formula like this to put these values in the project variables is also challenging to discover.
    parseJson(stepOutput_Compute_Metrics)['PROJECT_NAME.DATASET_NAME_NP'].computed
  7. Here is another example of similar challenges with Variables in Scenarios
  8. The ability to set both Local and Global Project variables.
--Tom
5 Comments

I wholeheartedly agree with this suggestion. Thanks for writing it up, Tom (@tgb417). 

We had a similar need recently but rather than accessing metrics we wanted to access checks. I wrote a Python script to access the results of a Run Checks script.  I posted it here asking if there was a better way thinking there had to be one. In fact using a formula would probably have been better but figuring out how to write it would have been no small task. 

The use case of checking values of metrics (and checks) and taking appropriate action within a scenario seems like a reasonable and common one that should be easier.

Marlan

@Marlan ,

Thanks for joining the conversation. It helps to understand that I’m not alone with this challenge.  Will take a look at your script also.  

Re-reviewing the academy the academy was helpful.  But did not cover the formula language “shaker” approach to these questions that would be helpful to folks getting started. 

https://academy.dataiku.com/path/advanced-designer/automation-course-1/675879

ktgross15
Dataiker
Status changed to: Acknowledged

Thanks for the feedback @tgb417 and @Marlan , we hear you and will let you know if we have any updates here!

Katie

Turribeach
Level 6

Up voted as it is indeed an area that needs improvement. Here is a post showing a use case for Metric values being used in Scenario logic. Note the complexity in extracting the metric value. It's doable but certainly needs more of a "coder" than a "clicker"...

https://community.dataiku.com/t5/What-s-New/Want-to-Control-the-Execution-of-Scenario-Steps-With-Con...

 

@Turribeach ,

Thanks for joining this conversation, and thank you for your wonderful post.