Community Conundrum 25: Feature Visualization is now live! Read More

Get last scenario and jobs event per project quickly

Level 4
Get last scenario and jobs event per project quickly

Hi dataiku users and experts,

 I need to very quickly evaluate from all projects (around 300) if some job was executed in the last couple of minutes (configurable) or some scenario triggered and executed. I did not found anything else than the project list_jobs method from the public API.

However iterating through all projects and calling list_jobs takes too much time (2-3 sec). 

I am thinking to write a watcher on top of scenarios and jobs folder in DSS data dir, to catch and cache the latest jobs/scenario execution per project. 

Or do anybody has a better idea? 

Thanks

 

3 Replies
Dataiker
Dataiker

Hi,

We'd advise creating an "internal stats" dataset on the "jobs" or "scenarios" view.

Then in your Python code, use get_dataframe() on this dataset, and filter the rows by the time.

0 Kudos
Level 4
Author
scenario_runs = dataiku.Dataset("dss_scenario_runs")
sf = scenario_runs.get_dataframe()
# The view has the start time in UTC, so therefore we shift by +2h
last_1m = datetime.now() - timedelta(seconds = 2*3600+60)
sf[sf['time_start'] >= last_1m]

This already takes 2 sometimes 3 seconds to query. Is it possible to get it faster?

0 Kudos
Dataiker
Dataiker

Hi,

No it isn't possible to get this any faster. I hadn't understood that your previous comment on 2-3 seconds was about the total time, not per project.

0 Kudos