Automated Scenario Documentation

aazariaz
Level 2
Automated Scenario Documentation

Hi all!

My team is putting together some documentation to keep track of our scenario schedules across all of our projects and flows, and other general scenario information like datasets being built, if there's an email output, etc.

Right now we're putting this together manually in Excel, but we were wondering if there was any other tool or feature in Dataiku that could automatically provide this kind of documentation for us?

Thanks!

2 Replies
KeijiY
Dataiker

Hello @aazariaz,

Thank you so much for your post on Community.

You might be able to write Python code utilizing the Dataiku Python APIs to retrieve your scenarios' information automatically.

Here is sample code to extract and print each project's each scenario's triggers, reporters, and steps:

import dataiku
from dataikuapi.dss.scenario import StepBasedScenarioSettings

# Get an API client.
client = dataiku.api_client()

# Get project keys.
project_keys = client.list_project_keys()
for project_key in project_keys:
    print('project_key:', project_key)
    # Get a project.
    project = client.get_project(project_key)
    # Get scenarios in the project.
    scenarios = project.list_scenarios(as_type='objects')
    for scenario in scenarios:
        print('  scenario.id:', scenario.id)
        settings = scenario.get_settings()
        # Get triggers of the scenario.
        raw_triggers = settings.raw_triggers
        print('    raw_triggers:', raw_triggers)
        # Get reporters of the scenario.
        raw_reporters = settings.raw_reporters
        print('    raw_reporters:', raw_reporters)
        if isinstance(settings, StepBasedScenarioSettings):
            # Get steps of the scenario.
            raw_steps = settings.raw_steps
            print('    raw_steps:', raw_steps)

 

Please see the following DSS documentation for the details of the APIs.

I hope this would help. Please let us know if you have any further questions.

Sincerely,
Keiji, Dataiku Technical Support

0 Kudos
Manuel
Dataiker Alumni

Hi,

Most of the information you list is available on the user interface:

  • Click the menu on the top bar (9 dots matrix) > Automation monitoring;
  • There are pages with a daily summary, timeline, triggers and reporters
  • See the attached images.

Are you using an Automation node already? Design and Automation activities have very different nature:

  • Whilst Automation workload is predictable, the Design workload is unpredictable, depending on what each user is doing;
  • Thus, you should aim at separating these workloads, for the latter not to have any negative impact on the former;
  • Another reason for separating the workloads is optimisation: optimising resources for 24h automation is very different from optimising resources for work hours design work.

Another consideration you should think about is the deployment model. Are your users allowed to promote their projects to automation or do you have dedicated Deployers in your organisation? The former enables autonomy, whilst the latter enables control of the automation workload.

I hope this helps.

0 Kudos