How to fetch notebook details
Hi,
I want to fetch last_modified_on and last_run_date of all notebooks in dss using python api.
Please help
Thanks,
Prince
Answers
-
Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,215 Dataiker
Hi,
Starting with DSS 9 we introduced a new list_jupyter_notebook a which would have some of the information you are looking for.
Below is are samples that would go through all projects and get the available metadata about the notebooks including the last_modified
However, the last_run_date is not available. The closest thing would be the kernelLastActivityTime which is available only for currently active notebooks with get_sessions().
1) Get all notebooks in all project last_modified_on
import dataiku from dataiku import pandasutils as pdu import pandas as pd client = dataiku.api_client() projects = client.list_projects() for p in projects : proj = client.get_project(p["projectKey"]) all_notebooks_active = proj.list_jupyter_notebooks(active=True, as_type="listitems") all_notebooks_inactive = proj.list_jupyter_notebooks(active=False, as_type="listitems") for notebook in all_notebooks_inactive: notebook_name = notebook['name'] #additional_notebook_details = proj.get_jupyter_notebook(notebook_name).get_content().get_metadata() print(notebook) for notebook_active in all_notebooks_active: print(notebook_active)
Get kernelLastActivityTime ( only available for active sessions)
import dataiku from dataiku import pandasutils as pdu import pandas as pd client = dataiku.api_client() projects = client.list_projects() for p in projects : proj = client.get_project(p["projectKey"]) all_notebooks_inactive = proj.list_jupyter_notebooks(active=True, as_type="listitems") for notebook in all_notebooks_inactive: notebook_object = proj.get_jupyter_notebook(notebook['name']).get_sessions() print(notebook_object)
Hope this helps!
-
Hi Alex,
How we can fetch this information from dss 8.
-
Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,215 Dataiker
Unfortunately, the "get_jupyter_notebook" and "list_jupyter_notebooks" to get a notebook object from the project object were added to project API in DSS 9. So you will need to upgrade. In DSS you can use list_running_notebooks() but this will be limited to active notebooks only. Here is an example :
import dataiku from dataiku import pandasutils as pdu import pandas as pd import datetime client = dataiku.api_client() projects = client.list_projects() for p in projects : proj = client.get_project(p["projectKey"]) for notebook in proj.list_running_notebooks(): for activesession in notebook.get_state()["activeSessions"]: session_start_time = datetime.datetime.fromtimestamp(round(activesession["sessionStartTime"] / 1000)) print( notebook.get_state()['name'], session_start_time, datetime.datetime.fromtimestamp(notebook.get_state()["lastModifiedOn"] / 1000))
-
Hi Alex,
For some reason we can't update dss now.
Is last modified on information is available in git history logs if yes how to fetch.
-
Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,215 Dataiker
You can use git log on a local git from within a project.
git log --grep='Saved Jupyter notebook' --pretty=format:"%h%x09%an%x09%ad%x09%s" --date=iso-local 90bc0b1 alex 2021-07-27 16:08:14 +0000 Saved Jupyter notebook 'alex's Python notebook' 56935ae alex 2021-07-27 16:07:21 +0000 Saved Jupyter notebook 'alex's Python notebook'
But finding the last commit for a particular can file can prove tricky. If you only need to iterate over all projects and then likely create a dataset out of this data and then use a group by a recipe to get the last date.
You can also see the last time change on a notebook file directly from command line :
[dataiku@ip-172-31-10-169 projects]$ for d in */ ; do echo $d; ls -lrt --time-style="full-iso" "$d"/ipython_notebooks/ | grep ipynb; done TESTING/ -rw-r--r--. 1 dataiku dataiku 19592 2021-07-27 16:08:14.623520014 +0000 alex's Python notebook.ipynb -rw-r--r--. 1 dataiku dataiku 19592 2021-07-27 16:08:14.708517196 +0000 alex's Python notebook-Copy1.ipyn