Trying to get error details for Scenarios that last run in Dataiku

jrmathieu63 · July 2023

Here is example of what I trying to run but it definitely is not correct since it trying to use a method that does not exist for a list object.
Any direction on how to reference the correct methods would be greatly appreciated.

import dataiku
import pandas as pd
from datetime import datetime, timedelta

# Get the list of all projects
projects = dataiku.api_client().list_projects()

# Get the list of all scenario runs for the last 24 hours for each project
last_24_hours = datetime.now() - timedelta(hours=24)
failed_runs = []
for project in projects:
project_key = project['projectKey']
scenarios = dataiku.api_client().list_scenarios(project_key)

for scenario in scenarios:
last_run = scenario.get_last_finished_run()
scenario_runs += dataiku.api_client().list_scenario_runs(project_key, scenario['id'], from_date=last_24_hours)
failed_runs += [run for run in scenario_runs if run['outcome'] == 'FAILED']

# Create a dataframe with details on failed runs and error messages
df = pd.DataFrame(failed_runs)
df = df[['projectKey', 'scenarioName', 'scenarioVersion', 'outcome', 'errorType', 'errorMessage']]

print(df)

Operating system used: AL2

jrmathieu63 · August 2023

Turribeach,

Now I know I was way off - definitely helps to get the output I am looking for.
Also need to remember to use code block proper code formatting when submitting question.

When still learning how to use both Python and API, I see what I need to learn since I still do not seen to have the basics done yet.

Thank you for getting past a limited code example.

Turribeach · August 2023

Tip: Please use the code block as otherwise the code indentation is lost and as you know in Python indentation is a must.

As I needed to write something similar I took your code and modified it. You had several things wrong. You need to be careful because some of the API methods return an object handle which you then use to retrieve the data with another method. In Python you can use type() and dir() to see the object type and what methods they have. Here is my code, which doesn't do exactly what you want but gets you there close enough and with a few changes will do what you want:

import dataiku
import pandas as pd
from datetime import datetime, timedelta

dataiku_client = dataiku.api_client()
project_keys = dataiku_client.list_project_keys()
failed_scenario_runs_to_check = 20

today = datetime.now()
month_ago = today - timedelta(days=30)

for project_key in project_keys:
    project_handle = dataiku_client.get_project(project_key)
    scenarios = project_handle.list_scenarios()

    for scenario in scenarios:
        scenario_runs_count = 0
        scenario_runs_by_date_count = 0
        failed_scenario_runs_count = 0
        
        scenario_handle = project_handle.get_scenario(scenario['id'])
        
        try:
            last_run = scenario_handle.get_last_finished_run()
            
            # Short-circuit evaluation: Only look further if the last run failed 
            if last_run.outcome == 'FAILED':

                scenario_runs = scenario_handle.get_last_runs(limit = failed_scenario_runs_to_check, only_finished_runs=False)
                scenario_runs_count = len(scenario_runs)
                
                scenario_runs_by_date = scenario_handle.get_runs_by_date(month_ago, today)
                scenario_runs_by_date_count = len(scenario_runs_by_date)
                
                # Only check further if we have more than 1 run and at least 1 in the last month
                if scenario_runs_count > 1 and scenario_runs_by_date_count >= 1:
                    for scenario_run in scenario_runs:
                        run_outcome = scenario_run.get_info()["result"].get('outcome')
                        
                        # If any runs have not failed skip to the next Scenario as we are looking for constantly failing scenarios
                        if run_outcome == 'FAILED':
                            failed_scenario_runs_count += 1
                        else:
                            break
                            
                    if failed_scenario_runs_count == scenario_runs_count:
                        print(project_key + ' - ' + scenario['id'] + ' failed_scenario_runs_count: ' + str(failed_scenario_runs_count) + ' - ' + 'scenario_runs_by_date_count: ' + str(scenario_runs_by_date_count))
            
        except ValueError:
            continue

Trying to get error details for Scenarios that last run in Dataiku

Best Answer

Answers

Categories

Setup Info

Tags