Output a notebook from a custom recipe

Solved!
weaam7
Level 2
Output a notebook from a custom recipe

Hello everyone, 

I am creating a custom recipe in which I want the output to be a notebook,

is there any approach for this? 

 

Thank you!

0 Kudos
1 Solution
ZachM
Dataiker

Hi @weaam7 ,

Thank you for providing additional information.

Code recipes donโ€™t support setting a notebook as the output, but instead, you can add a custom macro to your plugin.
The macro you create will accept the name of a dataset as an input parameter, and it will then create the notebook.

Before you create the notebook, you'll need an example notebook file to base it on. You can get this by extracting the contents of an existing notebook.
Run the following code in a new notebook. It will print the contents of the notebook that you specify:

import dataiku

client = dataiku.api_client()
project_key = dataiku.default_project_key()
project = client.get_project(project_key)

# Get the contents of an existing notebook to use as a reference.
# Replace 'MY_NOTEBOOK' with the name of a notebook that you want to copy.
notebook = project.get_jupyter_notebook('MY_NOTEBOOK')
raw_content = notebook.get_content().get_raw()
raw_content



Now that we have notebook contents, we can create the plugin macro.
Below, I have example runnable.json and runnable.py files that you can use as a template for your plugin macro. Add a Macro component to your plugin, and replace the contents of the two files.

/* runnable.json */
{
    "meta": {
        // label: name of the runnable as displayed, should be short
        "label": "Create notebook",

        // description: longer string to help end users understand what this runnable does
        "description": "",

        // icon: must be one of the FontAwesome 3.2.1 icons, complete list here at https://fontawesome.com/v3.2.1/icons/
        "icon": "icon-puzzle-piece"
    },

    /* whether the runnable's code is untrusted */
    "impersonate": false,


    /* params:
    DSS will generate a formular from this list of requested parameters.
    Your component code can then access the value provided by users using the "name" field of each parameter.

    Available parameter types include:
    STRING, INT, DOUBLE, BOOLEAN, DATE, SELECT, TEXTAREA, DATASET, DATASET_COLUMN, MANAGED_FOLDER, PRESET and others.

    For the full list and for more details, see the documentation: https://doc.dataiku.com/dss/latest/plugins/reference/params.html
    */
    "params": [
        {
            "name": "dataset",
            "label": "Dataset name",
            "type": "STRING",
            "description": "Dataset to base the notebook on",
            "mandatory": true
        },
        {
            "name": "notebook",
            "label": "Notebook name",
            "type": "STRING",
            "description": "Name of the notebook that will be created",
            "mandatory": true
        }
    ],

    /* list of required permissions on the project to see/run the runnable */
    "permissions": [],

    /* what the code's run() returns:
       - NONE : no result
       - HTML : a string that is a html (utf8 encoded)
       - FOLDER_FILE : a (folderId, path) pair to a file in a folder of this project (json-encoded)
       - FILE : raw data (as a python string) that will be stored in a temp file by DSS
       - URL : a url
     */
    "resultType": "HTML",

    /* label to use when the runnable's result is not inlined in the UI (ex: for urls) */
    "resultLabel": "my production",

    /* for FILE resultType, the extension to use for the temp file */
    "extension": "txt",

    /* for FILE resultType, the type of data stored in the temp file */
    "mimeType": "text/plain",

    /* Macro roles define where this macro will appear in DSS GUI. They are used to pre-fill a macro parameter with context.

       Each role consists of:
        - type: where the macro will be shown
            * when selecting DSS object(s): DATASET, DATASETS, API_SERVICE, API_SERVICE_VERSION, BUNDLE, VISUAL_ANALYSIS, SAVED_MODEL, MANAGED_FOLDER
            * in the global project list: PROJECT_MACROS
        - targetParamsKey(s): name of the parameter(s) that will be filled with the selected object
    */
    "macroRoles": [
      {
        "type": "DATASET",
        "targetParamsKey": "dataset"
      }
    ]
}
"""runnable.py"""
import dataiku
from dataiku.runnables import Runnable

class MyRunnable(Runnable):
    """The base interface for a Python runnable"""

    def __init__(self, project_key, config, plugin_config):
        """
        :param project_key: the project in which the runnable executes
        :param config: the dict of the configuration of the object
        :param plugin_config: contains the plugin settings
        """
        self.project_key = project_key
        self.config = config
        self.plugin_config = plugin_config
        self._init_project()

    def _init_project(self):
        client = dataiku.api_client()
        self.project = client.get_project(self.project_key)

    def get_progress_target(self):
        """
        If the runnable will return some progress info, have this function return a tuple of
        (target, unit) where unit is one of: SIZE, FILES, RECORDS, NONE
        """
        return None

    def run(self, progress_callback):
        """
        Do stuff here. Can return a string or raise an exception.
        The progress_callback is a function expecting 1 value: current progress
        """
        # Load the input dataset.
        # You can reference this when creating the notebook contents.
        dataset_name = self.config['dataset']
        mydataset = dataiku.Dataset(dataset_name, self.project_key)
        mydataset_df = mydataset.get_dataframe()


        # ******** CREATE YOUR NOTEBOOK CONTENT HERE ********
        notebook_content = {
            ...
        }


        # Create a new notebook.
        notebook_name = self.config['notebook']
        self.project.create_jupyter_notebook(notebook_name, notebook_content)

        return f'Created notebook: {notebook_name}'

Be sure to replace the ... in runnable.py with your notebook contents that you generated above.

We specified a macro role in runnable.json, which means that you can run the macro directly from a dataset like in the following screenshot:6D76D1EA-3A46-4A04-8F6B-7912F0C52B42_1_105_c.jpeg

 

 

Please let me know if you have any questions.

Thanks,

Zach M

View solution in original post

7 Replies
ZachM
Dataiker

Hi @weaam7,

I'm not sure what you mean when you say that you're creating a custom recipe.
Can you please elaborate on what type of recipe you're creating. For example, is it a Python code recipe? Or are you developing a custom plugin?

Also, please provide some more information about why you wanted the recipe to output to a notebook. Notebooks aren't typically used as the output to a recipe, so there might be a better way to accomplish your goal.


Thanks,
Zach

0 Kudos
weaam7
Level 2
Author

Hello @ZachM,

I am trying to develop a custom plugin using python, in which the input is a dataset specifying x, y.. etc. that will be used to generate a complete notebook of the chosen chart type. 

This is the main idea, is there anything I need to clarify?

 

Thank you.

 

0 Kudos
Ignacio_Toledo

Hello @weaam7,

If I understand you correctly, you'd like to create as an output a jupyter notebook report? I think that the nbformat package can be used to that end, as shown in this gist:

https://gist.github.com/fperez/9716279

Maybe that can help you to start.

0 Kudos
weaam7
Level 2
Author

Hello @Ignacio_Toledo,

I tried to use this library but I couldn't figure out why write() doesn't work when using it in the plugin script,

 

Thanks for your help! 

ZachM
Dataiker

Hi @weaam7 ,

Thank you for providing additional information.

Code recipes donโ€™t support setting a notebook as the output, but instead, you can add a custom macro to your plugin.
The macro you create will accept the name of a dataset as an input parameter, and it will then create the notebook.

Before you create the notebook, you'll need an example notebook file to base it on. You can get this by extracting the contents of an existing notebook.
Run the following code in a new notebook. It will print the contents of the notebook that you specify:

import dataiku

client = dataiku.api_client()
project_key = dataiku.default_project_key()
project = client.get_project(project_key)

# Get the contents of an existing notebook to use as a reference.
# Replace 'MY_NOTEBOOK' with the name of a notebook that you want to copy.
notebook = project.get_jupyter_notebook('MY_NOTEBOOK')
raw_content = notebook.get_content().get_raw()
raw_content



Now that we have notebook contents, we can create the plugin macro.
Below, I have example runnable.json and runnable.py files that you can use as a template for your plugin macro. Add a Macro component to your plugin, and replace the contents of the two files.

/* runnable.json */
{
    "meta": {
        // label: name of the runnable as displayed, should be short
        "label": "Create notebook",

        // description: longer string to help end users understand what this runnable does
        "description": "",

        // icon: must be one of the FontAwesome 3.2.1 icons, complete list here at https://fontawesome.com/v3.2.1/icons/
        "icon": "icon-puzzle-piece"
    },

    /* whether the runnable's code is untrusted */
    "impersonate": false,


    /* params:
    DSS will generate a formular from this list of requested parameters.
    Your component code can then access the value provided by users using the "name" field of each parameter.

    Available parameter types include:
    STRING, INT, DOUBLE, BOOLEAN, DATE, SELECT, TEXTAREA, DATASET, DATASET_COLUMN, MANAGED_FOLDER, PRESET and others.

    For the full list and for more details, see the documentation: https://doc.dataiku.com/dss/latest/plugins/reference/params.html
    */
    "params": [
        {
            "name": "dataset",
            "label": "Dataset name",
            "type": "STRING",
            "description": "Dataset to base the notebook on",
            "mandatory": true
        },
        {
            "name": "notebook",
            "label": "Notebook name",
            "type": "STRING",
            "description": "Name of the notebook that will be created",
            "mandatory": true
        }
    ],

    /* list of required permissions on the project to see/run the runnable */
    "permissions": [],

    /* what the code's run() returns:
       - NONE : no result
       - HTML : a string that is a html (utf8 encoded)
       - FOLDER_FILE : a (folderId, path) pair to a file in a folder of this project (json-encoded)
       - FILE : raw data (as a python string) that will be stored in a temp file by DSS
       - URL : a url
     */
    "resultType": "HTML",

    /* label to use when the runnable's result is not inlined in the UI (ex: for urls) */
    "resultLabel": "my production",

    /* for FILE resultType, the extension to use for the temp file */
    "extension": "txt",

    /* for FILE resultType, the type of data stored in the temp file */
    "mimeType": "text/plain",

    /* Macro roles define where this macro will appear in DSS GUI. They are used to pre-fill a macro parameter with context.

       Each role consists of:
        - type: where the macro will be shown
            * when selecting DSS object(s): DATASET, DATASETS, API_SERVICE, API_SERVICE_VERSION, BUNDLE, VISUAL_ANALYSIS, SAVED_MODEL, MANAGED_FOLDER
            * in the global project list: PROJECT_MACROS
        - targetParamsKey(s): name of the parameter(s) that will be filled with the selected object
    */
    "macroRoles": [
      {
        "type": "DATASET",
        "targetParamsKey": "dataset"
      }
    ]
}
"""runnable.py"""
import dataiku
from dataiku.runnables import Runnable

class MyRunnable(Runnable):
    """The base interface for a Python runnable"""

    def __init__(self, project_key, config, plugin_config):
        """
        :param project_key: the project in which the runnable executes
        :param config: the dict of the configuration of the object
        :param plugin_config: contains the plugin settings
        """
        self.project_key = project_key
        self.config = config
        self.plugin_config = plugin_config
        self._init_project()

    def _init_project(self):
        client = dataiku.api_client()
        self.project = client.get_project(self.project_key)

    def get_progress_target(self):
        """
        If the runnable will return some progress info, have this function return a tuple of
        (target, unit) where unit is one of: SIZE, FILES, RECORDS, NONE
        """
        return None

    def run(self, progress_callback):
        """
        Do stuff here. Can return a string or raise an exception.
        The progress_callback is a function expecting 1 value: current progress
        """
        # Load the input dataset.
        # You can reference this when creating the notebook contents.
        dataset_name = self.config['dataset']
        mydataset = dataiku.Dataset(dataset_name, self.project_key)
        mydataset_df = mydataset.get_dataframe()


        # ******** CREATE YOUR NOTEBOOK CONTENT HERE ********
        notebook_content = {
            ...
        }


        # Create a new notebook.
        notebook_name = self.config['notebook']
        self.project.create_jupyter_notebook(notebook_name, notebook_content)

        return f'Created notebook: {notebook_name}'

Be sure to replace the ... in runnable.py with your notebook contents that you generated above.

We specified a macro role in runnable.json, which means that you can run the macro directly from a dataset like in the following screenshot:6D76D1EA-3A46-4A04-8F6B-7912F0C52B42_1_105_c.jpeg

 

 

Please let me know if you have any questions.

Thanks,

Zach M

weaam7
Level 2
Author

Hello @ZachM,

 

Big thanx! this is exactly what I want,

I have one last question, how can I use the inputs of my plugin (x, y, titles.. etc) within the macro in 'notebook_content' field?

0 Kudos
weaam7
Level 2
Author

Hello @ZachM,

Thanks, Everything is clear now and it worked just fine!