Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Hi,
I want to fetch last_modified_on and last_run_date of all notebooks in dss using python api.
Please help
Thanks,
Prince
Hi,
Starting with DSS 9 we introduced a new list_jupyter_notebook a which would have some of the information you are looking for.
Below is are samples that would go through all projects and get the available metadata about the notebooks including the last_modified
However, the last_run_date is not available. The closest thing would be the kernelLastActivityTime which is available only for currently active notebooks with get_sessions().
1) Get all notebooks in all project last_modified_on
import dataiku
from dataiku import pandasutils as pdu
import pandas as pd
client = dataiku.api_client()
projects = client.list_projects()
for p in projects :
proj = client.get_project(p["projectKey"])
all_notebooks_active = proj.list_jupyter_notebooks(active=True, as_type="listitems")
all_notebooks_inactive = proj.list_jupyter_notebooks(active=False, as_type="listitems")
for notebook in all_notebooks_inactive:
notebook_name = notebook['name']
#additional_notebook_details = proj.get_jupyter_notebook(notebook_name).get_content().get_metadata()
print(notebook)
for notebook_active in all_notebooks_active:
print(notebook_active)
Get kernelLastActivityTime ( only available for active sessions)
import dataiku
from dataiku import pandasutils as pdu
import pandas as pd
client = dataiku.api_client()
projects = client.list_projects()
for p in projects :
proj = client.get_project(p["projectKey"])
all_notebooks_inactive = proj.list_jupyter_notebooks(active=True, as_type="listitems")
for notebook in all_notebooks_inactive:
notebook_object = proj.get_jupyter_notebook(notebook['name']).get_sessions()
print(notebook_object)
Hope this helps!
Hi Alex,
How we can fetch this information from dss 8.
Hi Alex,
For some reason we can't update dss now.
Is last modified on information is available in git history logs if yes how to fetch.
Unfortunately, the "get_jupyter_notebook" and "list_jupyter_notebooks" to get a notebook object from the project object were added to project API in DSS 9. So you will need to upgrade. In DSS you can use list_running_notebooks() but this will be limited to active notebooks only. Here is an example :
import dataiku
from dataiku import pandasutils as pdu
import pandas as pd
import datetime
client = dataiku.api_client()
projects = client.list_projects()
for p in projects :
proj = client.get_project(p["projectKey"])
for notebook in proj.list_running_notebooks():
for activesession in notebook.get_state()["activeSessions"]:
session_start_time = datetime.datetime.fromtimestamp(round(activesession["sessionStartTime"] / 1000))
print( notebook.get_state()['name'], session_start_time, datetime.datetime.fromtimestamp(notebook.get_state()["lastModifiedOn"] / 1000))
You can use git log on a local git from within a project.
git log --grep='Saved Jupyter notebook' --pretty=format:"%h%x09%an%x09%ad%x09%s" --date=iso-local
90bc0b1 alex 2021-07-27 16:08:14 +0000 Saved Jupyter notebook 'alex's Python notebook'
56935ae alex 2021-07-27 16:07:21 +0000 Saved Jupyter notebook 'alex's Python notebook'
But finding the last commit for a particular can file can prove tricky. If you only need to iterate over all projects and then likely create a dataset out of this data and then use a group by a recipe to get the last date.
You can also see the last time change on a notebook file directly from command line :
[dataiku@ip-172-31-10-169 projects]$ for d in */ ; do echo $d; ls -lrt --time-style="full-iso" "$d"/ipython_notebooks/ | grep ipynb; done
TESTING/
-rw-r--r--. 1 dataiku dataiku 19592 2021-07-27 16:08:14.623520014 +0000 alex's Python notebook.ipynb
-rw-r--r--. 1 dataiku dataiku 19592 2021-07-27 16:08:14.708517196 +0000 alex's Python notebook-Copy1.ipyn