Trying to get error details for Scenarios that last run in Dataiku
Here is example of what I trying to run but it definitely is not correct since it trying to use a method that does not exist for a list object.
Any direction on how to reference the correct methods would be greatly appreciated.
import dataiku
import pandas as pd
from datetime import datetime, timedelta
# Get the list of all projects
projects = dataiku.api_client().list_projects()
# Get the list of all scenario runs for the last 24 hours for each project
last_24_hours = datetime.now() - timedelta(hours=24)
failed_runs = []
for project in projects:
project_key = project['projectKey']
scenarios = dataiku.api_client().list_scenarios(project_key)
for scenario in scenarios:
last_run = scenario.get_last_finished_run()
scenario_runs += dataiku.api_client().list_scenario_runs(project_key, scenario['id'], from_date=last_24_hours)
failed_runs += [run for run in scenario_runs if run['outcome'] == 'FAILED']
# Create a dataframe with details on failed runs and error messages
df = pd.DataFrame(failed_runs)
df = df[['projectKey', 'scenarioName', 'scenarioVersion', 'outcome', 'errorType', 'errorMessage']]
print(df)
Operating system used: AL2
Best Answer
-
Turribeach,
Now I know I was way off - definitely helps to get the output I am looking for.
Also need to remember to use code block proper code formatting when submitting question.When still learning how to use both Python and API, I see what I need to learn since I still do not seen to have the basics done yet.
Thank you for getting past a limited code example.
Answers
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,160 Neuron
Tip: Please use the code block as otherwise the code indentation is lost and as you know in Python indentation is a must.
As I needed to write something similar I took your code and modified it. You had several things wrong. You need to be careful because some of the API methods return an object handle which you then use to retrieve the data with another method. In Python you can use type() and dir() to see the object type and what methods they have. Here is my code, which doesn't do exactly what you want but gets you there close enough and with a few changes will do what you want:
import dataiku import pandas as pd from datetime import datetime, timedelta dataiku_client = dataiku.api_client() project_keys = dataiku_client.list_project_keys() failed_scenario_runs_to_check = 20 today = datetime.now() month_ago = today - timedelta(days=30) for project_key in project_keys: project_handle = dataiku_client.get_project(project_key) scenarios = project_handle.list_scenarios() for scenario in scenarios: scenario_runs_count = 0 scenario_runs_by_date_count = 0 failed_scenario_runs_count = 0 scenario_handle = project_handle.get_scenario(scenario['id']) try: last_run = scenario_handle.get_last_finished_run() # Short-circuit evaluation: Only look further if the last run failed if last_run.outcome == 'FAILED': scenario_runs = scenario_handle.get_last_runs(limit = failed_scenario_runs_to_check, only_finished_runs=False) scenario_runs_count = len(scenario_runs) scenario_runs_by_date = scenario_handle.get_runs_by_date(month_ago, today) scenario_runs_by_date_count = len(scenario_runs_by_date) # Only check further if we have more than 1 run and at least 1 in the last month if scenario_runs_count > 1 and scenario_runs_by_date_count >= 1: for scenario_run in scenario_runs: run_outcome = scenario_run.get_info()["result"].get('outcome') # If any runs have not failed skip to the next Scenario as we are looking for constantly failing scenarios if run_outcome == 'FAILED': failed_scenario_runs_count += 1 else: break if failed_scenario_runs_count == scenario_runs_count: print(project_key + ' - ' + scenario['id'] + ' failed_scenario_runs_count: ' + str(failed_scenario_runs_count) + ' - ' + 'scenario_runs_by_date_count: ' + str(scenario_runs_by_date_count)) except ValueError: continue