Checking for dataset warn outcomes within a Scenario - is there a better way?

Marlan Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Dataiku Frontrunner Awards 2021 Participant, Neuron 2023 Posts: 316 Neuron

Hi all,

Within a scenario we would like to notify customers by email if dataset checks result in warnings. At minimum, we'd like to do this if there any warnings. It would be better to specify the checks that resulted in warnings.

We've been able to do this with a Python step after a Run Checks step in a scenario.

However the Python script (included below) is fairly messy. Is there an easier way? Ideally a Python step would not be required. If Python is required, could the script I included be simplified?



from dataiku.scenario import Scenario# Create the main handle to interact with the scenarioscenario_obj = Scenario()step_output = scenario_obj.get_previous_steps_outputs()check_data = []ok_cnt = 0warn_cnt = 0error_cnt = 0for step in step_output:# Each step has a name and resultstep_name = step['stepName']result = step['result'] # result is a dict# Determine the type of step (unfortunately, there is apparently no field that specifies type)if 'success' in result:# result dict has key 'success'step_type = 'execute_sql' # there may be another step type that has this (haven't checked all of them)else:some_value_in_result = result[list(result.keys())[0]] # arbitrary value in result dict (which itself may be a dict)if 'computed' in some_value_in_result:step_type = 'compute_metrics'elif 'results' in some_value_in_result:step_type = 'run_checks'else:step_type = 'other'# Create variables for run checks (may be multiple checks for each dataset)if step_type == 'run_checks':# datasets are keys in result dictfor dataset_info in result:# parse and clean up dataset namedataset_info_list = dataset_info.split('.')project_key = dataset_info_list[0]dataset_spec = dataset_info_list[1]if dataset_spec[-3:] == '_NP':# remove no partition textdataset_name = dataset_spec[:-3]else:dataset_name = dataset_spec# Extract check resultsdataset_value = result[dataset_info]checks_results = dataset_value['results'] # list of check resultsfor check_result in checks_results:check_spec = check_result['check']check_value = check_result['value']check_label = check_spec['meta'].get('label','<Unlabelled>')check_metric = check_spec['metricId'].split(':', 1)[1] # strip off metric type (type:metric or type:metric:otherinfo)check_message = check_value['message']check_outcome = check_value['outcome']if check_outcome == 'OK':ok_cnt += 1elif check_outcome == 'WARNING':warn_cnt += 1elif check_outcome == 'ERROR': # won't get here unless check step is set to ignore failureerror_cnt += 1check_dict = {'dataset': dataset_name, 'metric':check_metric, 'label': check_label,'outcome': check_outcome, 'message': check_message}check_data.append(check_dict)# Assign to variables (or could send email or take appropriate action here)scenario_obj.set_scenario_variables(datasetCheckResults=check_data, okCount=ok_cnt, warnCount=warn_cnt, errorCount=error_cnt)

Operating system used: Linux Red Hat

Setup Info
      Help me…