New to Dataiku DSS? Try out our NEW Quick Start Programs today and get onboarded on the product in just one hour! Let's go

Get last scenario and jobs event per project quickly

tomas
Level 5
Get last scenario and jobs event per project quickly

Hi dataiku users and experts,

 I need to very quickly evaluate from all projects (around 300) if some job was executed in the last couple of minutes (configurable) or some scenario triggered and executed. I did not found anything else than the project list_jobs method from the public API.

However iterating through all projects and calling list_jobs takes too much time (2-3 sec). 

I am thinking to write a watcher on top of scenarios and jobs folder in DSS data dir, to catch and cache the latest jobs/scenario execution per project. 

Or do anybody has a better idea? 

Thanks

 

3 Replies
Clément_Stenac
Dataiker
Dataiker

Hi,

We'd advise creating an "internal stats" dataset on the "jobs" or "scenarios" view.

Then in your Python code, use get_dataframe() on this dataset, and filter the rows by the time.

0 Kudos
tomas
Level 5
Author
scenario_runs = dataiku.Dataset("dss_scenario_runs")
sf = scenario_runs.get_dataframe()
# The view has the start time in UTC, so therefore we shift by +2h
last_1m = datetime.now() - timedelta(seconds = 2*3600+60)
sf[sf['time_start'] >= last_1m]

This already takes 2 sometimes 3 seconds to query. Is it possible to get it faster?

0 Kudos
Clément_Stenac
Dataiker
Dataiker

Hi,

No it isn't possible to get this any faster. I hadn't understood that your previous comment on 2-3 seconds was about the total time, not per project.

0 Kudos
A banner prompting to get Dataiku DSS