Clear Notebook Output before checking into Version Control

sujayramaiah
sujayramaiah Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 5 ✭✭✭
edited July 16 in General Discussion

Suppose we are working with certain sensitive data and do not want the output of the cells to get stored in the version control system, how can we clear the outputs by default?

When checking in the notebooks separately, we use the following library which clears the output cells.

# Install libraries
pip install nbstripout nbconvert

# Inside the repository folder (with notebooks)
nbstripout --install

This automatically adds the hook to any notebook and strips out the output from any notebook in the repo.Testing this out:

git add Test.ipynb

How can we achieve this in DataIku?
Can this be integrated with DataIku as an option?

Thanks,

Sujay

Answers

  • Alexandru
    Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,226 Dataiker

    Hi @sujayramaiah
    ,

    The ability to strip notebook output before commits is currently in our backlog.

    Unfortunately, I don't see a way to integrate the method you mentioned with DSS currently based on how the git and notebook are set up in DSS.

    However, if a user clears all output from the menu in the Cell - All Outputs - Clear and saves the notebook before committing their changes to external git. The external git would not have any of the data from the notebook outputs.

    Hope that helps!

  • ClaudiusH
    ClaudiusH Alpha Tester, Dataiker Alumni, Registered Posts: 106 ✭✭✭✭✭✭

    A product idea to has been submitted. Please share your support through your vote and add your expectations for the ideal implementation in the comments there.

  • apichery
    apichery Dataiker, Alpha Tester, Registered, Product Ideas Manager, Moderator Posts: 64 Dataiker
    edited November 15

    This feature has been implemented in DSS 11.3.0.

    All notebooks created since DSS 11.3.0 are stored using 2 files. One with output cells (in a local directory) and one without output cells (in the config directory, which is the one that is git versioned).

    So it should just work out of the box without any additional step.

    See DSS 11.3.0 release notes

Setup Info
    Tags
      Help me…