Get last scenario and jobs event per project quickly

tomas · ‎08-25-2020

Hi dataiku users and experts,

I need to very quickly evaluate from all projects (around 300) if some job was executed in the last couple of minutes (configurable) or some scenario triggered and executed. I did not found anything else than the project list_jobs method from the public API.

However iterating through all projects and calling list_jobs takes too much time (2-3 sec).

I am thinking to write a watcher on top of scenarios and jobs folder in DSS data dir, to catch and cache the latest jobs/scenario execution per project.

Or do anybody has a better idea?

Thanks

Clément_Stenac · ‎08-25-2020

Hi,

We'd advise creating an "internal stats" dataset on the "jobs" or "scenarios" view.

Then in your Python code, use get_dataframe() on this dataset, and filter the rows by the time.

tomas · ‎08-25-2020

scenario_runs = dataiku.Dataset("dss_scenario_runs")
sf = scenario_runs.get_dataframe()
# The view has the start time in UTC, so therefore we shift by +2h
last_1m = datetime.now() - timedelta(seconds = 2*3600+60)
sf[sf['time_start'] >= last_1m]

This already takes 2 sometimes 3 seconds to query. Is it possible to get it faster?

Clément_Stenac · ‎08-25-2020

Hi,

No it isn't possible to get this any faster. I hadn't understood that your previous comment on 2-3 seconds was about the total time, not per project.

Sign up to take part

Get last scenario and jobs event per project quickly

Get last scenario and jobs event per project quickly