Trying to get error details for Scenarios that last run in Dataiku

Options
jrmathieu63
jrmathieu63 Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Registered Posts: 26 ✭✭✭✭✭

Here is example of what I trying to run but it definitely is not correct since it trying to use a method that does not exist for a list object.
Any direction on how to reference the correct methods would be greatly appreciated.

import dataiku
import pandas as pd
from datetime import datetime, timedelta

# Get the list of all projects
projects = dataiku.api_client().list_projects()

# Get the list of all scenario runs for the last 24 hours for each project
last_24_hours = datetime.now() - timedelta(hours=24)
failed_runs = []
for project in projects:
project_key = project['projectKey']
scenarios = dataiku.api_client().list_scenarios(project_key)

for scenario in scenarios:
last_run = scenario.get_last_finished_run()
scenario_runs += dataiku.api_client().list_scenario_runs(project_key, scenario['id'], from_date=last_24_hours)
failed_runs += [run for run in scenario_runs if run['outcome'] == 'FAILED']

# Create a dataframe with details on failed runs and error messages
df = pd.DataFrame(failed_runs)
df = df[['projectKey', 'scenarioName', 'scenarioVersion', 'outcome', 'errorType', 'errorMessage']]

print(df)


Operating system used: AL2

Tagged:

Best Answer

  • jrmathieu63
    jrmathieu63 Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Registered Posts: 26 ✭✭✭✭✭
    Answer ✓
    Options

    Turribeach,

    Now I know I was way off - definitely helps to get the output I am looking for.
    Also need to remember to use code block proper code formatting when submitting question.

    When still learning how to use both Python and API, I see what I need to learn since I still do not seen to have the basics done yet.

    Thank you for getting past a limited code example.

Answers

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 1,728 Neuron
    edited July 17
    Options

    Tip: Please use the code block as otherwise the code indentation is lost and as you know in Python indentation is a must.

    As I needed to write something similar I took your code and modified it. You had several things wrong. You need to be careful because some of the API methods return an object handle which you then use to retrieve the data with another method. In Python you can use type() and dir() to see the object type and what methods they have. Here is my code, which doesn't do exactly what you want but gets you there close enough and with a few changes will do what you want:

    import dataiku
    import pandas as pd
    from datetime import datetime, timedelta
    
    dataiku_client = dataiku.api_client()
    project_keys = dataiku_client.list_project_keys()
    failed_scenario_runs_to_check = 20
    
    today = datetime.now()
    month_ago = today - timedelta(days=30)
    
    for project_key in project_keys:
        project_handle = dataiku_client.get_project(project_key)
        scenarios = project_handle.list_scenarios()
    
        for scenario in scenarios:
            scenario_runs_count = 0
            scenario_runs_by_date_count = 0
            failed_scenario_runs_count = 0
            
            scenario_handle = project_handle.get_scenario(scenario['id'])
            
            try:
                last_run = scenario_handle.get_last_finished_run()
                
                # Short-circuit evaluation: Only look further if the last run failed 
                if last_run.outcome == 'FAILED':
    
                    scenario_runs = scenario_handle.get_last_runs(limit = failed_scenario_runs_to_check, only_finished_runs=False)
                    scenario_runs_count = len(scenario_runs)
                    
                    scenario_runs_by_date = scenario_handle.get_runs_by_date(month_ago, today)
                    scenario_runs_by_date_count = len(scenario_runs_by_date)
                    
                    # Only check further if we have more than 1 run and at least 1 in the last month
                    if scenario_runs_count > 1 and scenario_runs_by_date_count >= 1:
                        for scenario_run in scenario_runs:
                            run_outcome = scenario_run.get_info()["result"].get('outcome')
                            
                            # If any runs have not failed skip to the next Scenario as we are looking for constantly failing scenarios
                            if run_outcome == 'FAILED':
                                failed_scenario_runs_count += 1
                            else:
                                break
                                
                        if failed_scenario_runs_count == scenario_runs_count:
                            print(project_key + ' - ' + scenario['id'] + ' failed_scenario_runs_count: ' + str(failed_scenario_runs_count) + ' - ' + 'scenario_runs_by_date_count: ' + str(scenario_runs_by_date_count))
                
            except ValueError:
                continue

Setup Info
    Tags
      Help me…