User Story
As a Citizen Data Scientist just starting to advance with Scenarios, that typically uses “shaker” Dataiku formula steps, it would be helpful to have a more intuitive setup to get Data Set Metrics into a place that is usable by scenario steps.
Example of Roadblocks:
- Non-Partitioned datasets must have "_NP" added to what you think is the name of the dataset.
- Having to write a formula like this in order to get a row count into a variable is not easily intuited.
filter(parseJson(stepOutput_Compute_Metrics)['PROJECT_NAME.DATASET_NAME_NP'].computed, x, x["metricId"] == "records:COUNT_RECORDS")[0].value
- There is no obvious help, tutorial, training, knowledge base examples, code snip-its, to get a user through this particularly on non-partitioned datasets. The use examples are fairly limited, and focus on python code based examples.
- If a dataset in your flow that is used to create a metric is temporarily empty a variable based on a metric like a min or max date in a certain column will fail. (One can work around this with a SQL, based metric, and something like a coalesce to deal with the null values of an empty dataset.)
- Automatic building of a flow that has a dataset that is filtered by a variable is not correctly recognized by the recursive flow build algorithm. (One may be able to work around this issue by explicitly building each data set in the flow in it’s own scenario step.)
- There is no user interface to see all of the possible values that one can extract from the computed metrics. Not in tool tips or a dedicated interface. Logs don't help much either. Having to discover a formula like this to put these values in the project variables is also challenging to discover.
parseJson(stepOutput_Compute_Metrics)['PROJECT_NAME.DATASET_NAME_NP'].computed
- Here is another example of similar challenges with Variables in Scenarios
- The ability to set both Local and Global Project variables.
--Tom