How to fetch notebook details

Raj7974
Raj7974 Registered Posts: 8 ✭✭✭

Hi,

I want to fetch last_modified_on and last_run_date of all notebooks in dss using python api.

Please help

Thanks,

Prince

Answers

  • Alexandru
    Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,225 Dataiker
    edited July 17

    Hi,

    Starting with DSS 9 we introduced a new list_jupyter_notebook a which would have some of the information you are looking for.

    Below is are samples that would go through all projects and get the available metadata about the notebooks including the last_modified

    However, the last_run_date is not available. The closest thing would be the kernelLastActivityTime which is available only for currently active notebooks with get_sessions().

    1) Get all notebooks in all project last_modified_on

    import dataiku
    from dataiku import pandasutils as pdu
    import pandas as pd
    client = dataiku.api_client()
    projects = client.list_projects()
    
    for p in projects :
        proj = client.get_project(p["projectKey"])
        all_notebooks_active = proj.list_jupyter_notebooks(active=True, as_type="listitems")
        all_notebooks_inactive = proj.list_jupyter_notebooks(active=False, as_type="listitems")
        for notebook in all_notebooks_inactive:
            notebook_name = notebook['name']
            
            #additional_notebook_details = proj.get_jupyter_notebook(notebook_name).get_content().get_metadata()
            
            print(notebook)
        for notebook_active in all_notebooks_active:
            print(notebook_active)

    Get kernelLastActivityTime ( only available for active sessions)

    import dataiku
    from dataiku import pandasutils as pdu
    import pandas as pd
    client = dataiku.api_client()
    projects = client.list_projects()
    
    for p in projects :
        proj = client.get_project(p["projectKey"])
        all_notebooks_inactive = proj.list_jupyter_notebooks(active=True, as_type="listitems")
        for notebook in all_notebooks_inactive:
            notebook_object = proj.get_jupyter_notebook(notebook['name']).get_sessions()
            
            print(notebook_object)
           

    Hope this helps!

  • Raj7974
    Raj7974 Registered Posts: 8 ✭✭✭

    Hi Alex,

    How we can fetch this information from dss 8.

  • Alexandru
    Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,225 Dataiker
    edited July 17

    Unfortunately, the "get_jupyter_notebook" and "list_jupyter_notebooks" to get a notebook object from the project object were added to project API in DSS 9. So you will need to upgrade. In DSS you can use list_running_notebooks() but this will be limited to active notebooks only. Here is an example :

    import dataiku
    from dataiku import pandasutils as pdu
    import pandas as pd
    import datetime
    
    client = dataiku.api_client()
    projects = client.list_projects()
    
    for p in projects :
        proj = client.get_project(p["projectKey"])
        for notebook in proj.list_running_notebooks():
            for activesession in notebook.get_state()["activeSessions"]:
                session_start_time = datetime.datetime.fromtimestamp(round(activesession["sessionStartTime"] / 1000))
                print( notebook.get_state()['name'], session_start_time,  datetime.datetime.fromtimestamp(notebook.get_state()["lastModifiedOn"] / 1000))
    

    Screenshot 2021-07-27 at 13.29.17.png

  • Raj7974
    Raj7974 Registered Posts: 8 ✭✭✭

    Hi Alex,

    For some reason we can't update dss now.

    Is last modified on information is available in git history logs if yes how to fetch.

  • Alexandru
    Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,225 Dataiker
    edited July 17

    You can use git log on a local git from within a project.

     git log  --grep='Saved Jupyter notebook'  --pretty=format:"%h%x09%an%x09%ad%x09%s" --date=iso-local
    
    90bc0b1 alex    2021-07-27 16:08:14 +0000       Saved Jupyter notebook 'alex's Python notebook'
    56935ae alex    2021-07-27 16:07:21 +0000       Saved Jupyter notebook 'alex's Python notebook'

    But finding the last commit for a particular can file can prove tricky. If you only need to iterate over all projects and then likely create a dataset out of this data and then use a group by a recipe to get the last date.

    You can also see the last time change on a notebook file directly from command line :

    [dataiku@ip-172-31-10-169 projects]$ for d in */ ; do echo $d;  ls -lrt --time-style="full-iso" "$d"/ipython_notebooks/ | grep ipynb;   done
    TESTING/
    -rw-r--r--. 1 dataiku dataiku 19592 2021-07-27 16:08:14.623520014 +0000 alex's Python notebook.ipynb
    -rw-r--r--. 1 dataiku dataiku 19592 2021-07-27 16:08:14.708517196 +0000 alex's Python notebook-Copy1.ipyn

Setup Info
    Tags
      Help me…