Get last scenario and jobs event per project quickly
Hi dataiku users and experts,
I need to very quickly evaluate from all projects (around 300) if some job was executed in the last couple of minutes (configurable) or some scenario triggered and executed. I did not found anything else than the project list_jobs method from the public API.
However iterating through all projects and calling list_jobs takes too much time (2-3 sec).
I am thinking to write a watcher on top of scenarios and jobs folder in DSS data dir, to catch and cache the latest jobs/scenario execution per project.
Or do anybody has a better idea?
Thanks
Answers
-
Hi,
We'd advise creating an "internal stats" dataset on the "jobs" or "scenarios" view.
Then in your Python code, use get_dataframe() on this dataset, and filter the rows by the time.
-
scenario_runs = dataiku.Dataset("dss_scenario_runs")
sf = scenario_runs.get_dataframe()
# The view has the start time in UTC, so therefore we shift by +2h
last_1m = datetime.now() - timedelta(seconds = 2*3600+60)
sf[sf['time_start'] >= last_1m]This already takes 2 sometimes 3 seconds to query. Is it possible to get it faster?
-
Hi,
No it isn't possible to get this any faster. I hadn't understood that your previous comment on 2-3 seconds was about the total time, not per project.