Do you know the History of Data Science? READ MORE

How to fetch notebook details

Raj7974
Level 1
How to fetch notebook details

Hi, 

I want to fetch last_modified_on and last_run_date of all notebooks in dss using python api. 

Please help

Thanks, 

Prince

0 Kudos
5 Replies
AlexT
Dataiker
Dataiker

Hi,

Starting with DSS 9 we introduced a new list_jupyter_notebook a which would have some of the information you are looking for. 

Below is are samples that would go through all projects and get the available metadata about the notebooks including the last_modified

However, the last_run_date is not available.  The closest thing would be the kernelLastActivityTime which is available only for currently active notebooks with get_sessions().

1) Get all notebooks in all project last_modified_on

 

import dataiku
from dataiku import pandasutils as pdu
import pandas as pd
client = dataiku.api_client()
projects = client.list_projects()

for p in projects :
    proj = client.get_project(p["projectKey"])
    all_notebooks_active = proj.list_jupyter_notebooks(active=True, as_type="listitems")
    all_notebooks_inactive = proj.list_jupyter_notebooks(active=False, as_type="listitems")
    for notebook in all_notebooks_inactive:
        notebook_name = notebook['name']
        
        #additional_notebook_details = proj.get_jupyter_notebook(notebook_name).get_content().get_metadata()
        
        print(notebook)
    for notebook_active in all_notebooks_active:
        print(notebook_active)

 

 

Get kernelLastActivityTime ( only available for active sessions)

 

 

import dataiku
from dataiku import pandasutils as pdu
import pandas as pd
client = dataiku.api_client()
projects = client.list_projects()

for p in projects :
    proj = client.get_project(p["projectKey"])
    all_notebooks_inactive = proj.list_jupyter_notebooks(active=True, as_type="listitems")
    for notebook in all_notebooks_inactive:
        notebook_object = proj.get_jupyter_notebook(notebook['name']).get_sessions()
        
        print(notebook_object)
       

 

 

Hope this helps!

0 Kudos
Raj7974
Level 1
Author

Hi Alex, 

 

How we can fetch this information from dss 8.

0 Kudos
Raj7974
Level 1
Author

Hi Alex, 

For some reason we can't update dss now. 

Is last modified on information is available in git history logs if yes how to fetch. 

 

0 Kudos
AlexT
Dataiker
Dataiker

Unfortunately, the "get_jupyter_notebook" and "list_jupyter_notebooks"  to get a notebook object from the project object were added to project API in DSS 9. So you will need to upgrade. In DSS you can use list_running_notebooks() but this will be limited to active notebooks only. Here is an example :

 

import dataiku
from dataiku import pandasutils as pdu
import pandas as pd
import datetime

client = dataiku.api_client()
projects = client.list_projects()

for p in projects :
    proj = client.get_project(p["projectKey"])
    for notebook in proj.list_running_notebooks():
        for activesession in notebook.get_state()["activeSessions"]:
            session_start_time = datetime.datetime.fromtimestamp(round(activesession["sessionStartTime"] / 1000))
            print( notebook.get_state()['name'], session_start_time,  datetime.datetime.fromtimestamp(notebook.get_state()["lastModifiedOn"] / 1000))

Screenshot 2021-07-27 at 13.29.17.png

0 Kudos
AlexT
Dataiker
Dataiker

You can use git log on a local git from within a project.

 git log  --grep='Saved Jupyter notebook'  --pretty=format:"%h%x09%an%x09%ad%x09%s" --date=iso-local

90bc0b1 alex    2021-07-27 16:08:14 +0000       Saved Jupyter notebook 'alex's Python notebook'
56935ae alex    2021-07-27 16:07:21 +0000       Saved Jupyter notebook 'alex's Python notebook'

But finding the last commit for a particular can file can prove tricky. If you only need to iterate over all projects and then likely create a dataset out of this data and then use a group by a recipe to get the last date. 

You can also see the last time change on a notebook file directly from command line : 

[dataiku@ip-172-31-10-169 projects]$ for d in */ ; do echo $d;  ls -lrt --time-style="full-iso" "$d"/ipython_notebooks/ | grep ipynb;   done
TESTING/
-rw-r--r--. 1 dataiku dataiku 19592 2021-07-27 16:08:14.623520014 +0000 alex's Python notebook.ipynb
-rw-r--r--. 1 dataiku dataiku 19592 2021-07-27 16:08:14.708517196 +0000 alex's Python notebook-Copy1.ipyn

 

0 Kudos
A banner prompting to get Dataiku DSS