Automated Scenario Documentation

aazariaz
aazariaz Registered Posts: 4 ✭✭✭

Hi all!

My team is putting together some documentation to keep track of our scenario schedules across all of our projects and flows, and other general scenario information like datasets being built, if there's an email output, etc.

Right now we're putting this together manually in Excel, but we were wondering if there was any other tool or feature in Dataiku that could automatically provide this kind of documentation for us?

Thanks!

Answers

  • Keiji
    Keiji Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 52 Dataiker
    edited July 17

    Hello @aazariaz
    ,

    Thank you so much for your post on Community.

    You might be able to write Python code utilizing the Dataiku Python APIs to retrieve your scenarios' information automatically.

    Here is sample code to extract and print each project's each scenario's triggers, reporters, and steps:

    import dataiku
    from dataikuapi.dss.scenario import StepBasedScenarioSettings
    
    # Get an API client.
    client = dataiku.api_client()
    
    # Get project keys.
    project_keys = client.list_project_keys()
    for project_key in project_keys:
        print('project_key:', project_key)
        # Get a project.
        project = client.get_project(project_key)
        # Get scenarios in the project.
        scenarios = project.list_scenarios(as_type='objects')
        for scenario in scenarios:
            print('  scenario.id:', scenario.id)
            settings = scenario.get_settings()
            # Get triggers of the scenario.
            raw_triggers = settings.raw_triggers
            print('    raw_triggers:', raw_triggers)
            # Get reporters of the scenario.
            raw_reporters = settings.raw_reporters
            print('    raw_reporters:', raw_reporters)
            if isinstance(settings, StepBasedScenarioSettings):
                # Get steps of the scenario.
                raw_steps = settings.raw_steps
                print('    raw_steps:', raw_steps)

    Please see the following DSS documentation for the details of the APIs.

    I hope this would help. Please let us know if you have any further questions.

    Sincerely,
    Keiji, Dataiku Technical Support

  • Manuel
    Manuel Alpha Tester, Dataiker Alumni, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Dataiku DSS Adv Designer, Registered Posts: 193 ✭✭✭✭✭✭✭

    Hi,

    Most of the information you list is available on the user interface:

    • Click the menu on the top bar (9 dots matrix) > Automation monitoring;
    • There are pages with a daily summary, timeline, triggers and reporters
    • See the attached images.

    Are you using an Automation node already? Design and Automation activities have very different nature:

    • Whilst Automation workload is predictable, the Design workload is unpredictable, depending on what each user is doing;
    • Thus, you should aim at separating these workloads, for the latter not to have any negative impact on the former;
    • Another reason for separating the workloads is optimisation: optimising resources for 24h automation is very different from optimising resources for work hours design work.

    Another consideration you should think about is the deployment model. Are your users allowed to promote their projects to automation or do you have dedicated Deployers in your organisation? The former enables autonomy, whilst the latter enables control of the automation workload.

    I hope this helps.

Setup Info
    Tags
      Help me…