Get last scenario and jobs event per project quickly

Options
tomas
tomas Registered, Neuron 2022 Posts: 120 ✭✭✭✭✭

Hi dataiku users and experts,

I need to very quickly evaluate from all projects (around 300) if some job was executed in the last couple of minutes (configurable) or some scenario triggered and executed. I did not found anything else than the project list_jobs method from the public API.

However iterating through all projects and calling list_jobs takes too much time (2-3 sec).

I am thinking to write a watcher on top of scenarios and jobs folder in DSS data dir, to catch and cache the latest jobs/scenario execution per project.

Or do anybody has a better idea?

Thanks

Answers

  • Clément_Stenac
    Clément_Stenac Dataiker, Dataiku DSS Core Designer Posts: 753 Dataiker
    Options

    Hi,

    We'd advise creating an "internal stats" dataset on the "jobs" or "scenarios" view.

    Then in your Python code, use get_dataframe() on this dataset, and filter the rows by the time.

  • tomas
    tomas Registered, Neuron 2022 Posts: 120 ✭✭✭✭✭
    edited 3:37PM
    Options
    scenario_runs = dataiku.Dataset("dss_scenario_runs")
    sf = scenario_runs.get_dataframe()
    # The view has the start time in UTC, so therefore we shift by +2h
    last_1m = datetime.now() - timedelta(seconds = 2*3600+60)
    sf[sf['time_start'] >= last_1m]

    This already takes 2 sometimes 3 seconds to query. Is it possible to get it faster?

  • Clément_Stenac
    Clément_Stenac Dataiker, Dataiku DSS Core Designer Posts: 753 Dataiker
    Options

    Hi,

    No it isn't possible to get this any faster. I hadn't understood that your previous comment on 2-3 seconds was about the total time, not per project.

Setup Info
    Tags
      Help me…