How to fetch notebook details

Options
Raj7974
Raj7974 Registered Posts: 8 ✭✭✭

Hi,

I want to fetch last_modified_on and last_run_date of all notebooks in dss using python api.

Please help

Thanks,

Prince

Answers

  • Alexandru
    Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,209 Dataiker
    Options

    Hi,

    Starting with DSS 9 we introduced a new list_jupyter_notebook a which would have some of the information you are looking for.

    Below is are samples that would go through all projects and get the available metadata about the notebooks including the last_modified

    However, the last_run_date is not available. The closest thing would be the kernelLastActivityTime which is available only for currently active notebooks with get_sessions().

    1) Get all notebooks in all project last_modified_on

    import dataikufrom dataiku import pandasutils as pduimport pandas as pdclient = dataiku.api_client()projects = client.list_projects()for p in projects :proj = client.get_project(p["projectKey"])all_notebooks_active = proj.list_jupyter_notebooks(active=True, as_type="listitems")all_notebooks_inactive = proj.list_jupyter_notebooks(active=False, as_type="listitems")for notebook in all_notebooks_inactive:notebook_name = notebook['name']#additional_notebook_details = proj.get_jupyter_notebook(notebook_name).get_content().get_metadata()print(notebook)for notebook_active in all_notebooks_active:print(notebook_active)

    Get kernelLastActivityTime ( only available for active sessions)

    import dataikufrom dataiku import pandasutils as pduimport pandas as pdclient = dataiku.api_client()projects = client.list_projects()for p in projects :proj = client.get_project(p["projectKey"])all_notebooks_inactive = proj.list_jupyter_notebooks(active=True, as_type="listitems")for notebook in all_notebooks_inactive:notebook_object = proj.get_jupyter_notebook(notebook['name']).get_sessions()print(notebook_object)

    Hope this helps!

  • Raj7974
    Raj7974 Registered Posts: 8 ✭✭✭
    Options

    Hi Alex,

    How we can fetch this information from dss 8.

  • Alexandru
    Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,209 Dataiker
    Options

    Unfortunately, the "get_jupyter_notebook" and "list_jupyter_notebooks" to get a notebook object from the project object were added to project API in DSS 9. So you will need to upgrade. In DSS you can use list_running_notebooks() but this will be limited to active notebooks only. Here is an example :

    import dataikufrom dataiku import pandasutils as pduimport pandas as pdimport datetimeclient = dataiku.api_client()projects = client.list_projects()for p in projects :proj = client.get_project(p["projectKey"])for notebook in proj.list_running_notebooks():for activesession in notebook.get_state()["activeSessions"]:session_start_time = datetime.datetime.fromtimestamp(round(activesession["sessionStartTime"] / 1000))print( notebook.get_state()['name'], session_start_time, datetime.datetime.fromtimestamp(notebook.get_state()["lastModifiedOn"] / 1000))

    Screenshot 2021-07-27 at 13.29.17.png

  • Raj7974
    Raj7974 Registered Posts: 8 ✭✭✭
    Options

    Hi Alex,

    For some reason we can't update dss now.

    Is last modified on information is available in git history logs if yes how to fetch.

  • Alexandru
    Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,209 Dataiker
    Options

    You can use git log on a local git from within a project.

    git log --grep='Saved Jupyter notebook' --pretty=format:"%h%x09%an%x09%ad%x09%s" --date=iso-local90bc0b1 alex 2021-07-27 16:08:14 +0000 Saved Jupyter notebook 'alex's Python notebook'56935ae alex 2021-07-27 16:07:21 +0000 Saved Jupyter notebook 'alex's Python notebook'

    But finding the last commit for a particular can file can prove tricky. If you only need to iterate over all projects and then likely create a dataset out of this data and then use a group by a recipe to get the last date.

    You can also see the last time change on a notebook file directly from command line :

    [dataiku@ip-172-31-10-169 projects]$ for d in */ ; do echo $d; ls -lrt --time-style="full-iso" "$d"/ipython_notebooks/ | grep ipynb; doneTESTING/-rw-r--r--. 1 dataiku dataiku 19592 2021-07-27 16:08:14.623520014 +0000 alex's Python notebook.ipynb-rw-r--r--. 1 dataiku dataiku 19592 2021-07-27 16:08:14.708517196 +0000 alex's Python notebook-Copy1.ipyn

Setup Info
    Tags
      Help me…