Output a notebook from a custom recipe
Hello everyone,
I am creating a custom recipe in which I want the output to be a notebook,
is there any approach for this?
Thank you!
Best Answer
-
Hi @weaam7
,Thank you for providing additional information.
Code recipes don’t support setting a notebook as the output, but instead, you can add a custom macro to your plugin.
The macro you create will accept the name of a dataset as an input parameter, and it will then create the notebook.Before you create the notebook, you'll need an example notebook file to base it on. You can get this by extracting the contents of an existing notebook.
Run the following code in a new notebook. It will print the contents of the notebook that you specify:import dataiku client = dataiku.api_client() project_key = dataiku.default_project_key() project = client.get_project(project_key) # Get the contents of an existing notebook to use as a reference. # Replace 'MY_NOTEBOOK' with the name of a notebook that you want to copy. notebook = project.get_jupyter_notebook('MY_NOTEBOOK') raw_content = notebook.get_content().get_raw() raw_content
Now that we have notebook contents, we can create the plugin macro.
Below, I have example runnable.json and runnable.py files that you can use as a template for your plugin macro. Add a Macro component to your plugin, and replace the contents of the two files./* runnable.json */ { "meta": { // label: name of the runnable as displayed, should be short "label": "Create notebook", // description: longer string to help end users understand what this runnable does "description": "", // icon: must be one of the FontAwesome 3.2.1 icons, complete list here at https://fontawesome.com/v3.2.1/icons/ "icon": "icon-puzzle-piece" }, /* whether the runnable's code is untrusted */ "impersonate": false, /* params: DSS will generate a formular from this list of requested parameters. Your component code can then access the value provided by users using the "name" field of each parameter. Available parameter types include: STRING, INT, DOUBLE, BOOLEAN, DATE, SELECT, TEXTAREA, DATASET, DATASET_COLUMN, MANAGED_FOLDER, PRESET and others. For the full list and for more details, see the documentation: https://doc.dataiku.com/dss/latest/plugins/reference/params.html */ "params": [ { "name": "dataset", "label": "Dataset name", "type": "STRING", "description": "Dataset to base the notebook on", "mandatory": true }, { "name": "notebook", "label": "Notebook name", "type": "STRING", "description": "Name of the notebook that will be created", "mandatory": true } ], /* list of required permissions on the project to see/run the runnable */ "permissions": [], /* what the code's run() returns: - NONE : no result - HTML : a string that is a html (utf8 encoded) - FOLDER_FILE : a (folderId, path) pair to a file in a folder of this project (json-encoded) - FILE : raw data (as a python string) that will be stored in a temp file by DSS - URL : a url */ "resultType": "HTML", /* label to use when the runnable's result is not inlined in the UI (ex: for urls) */ "resultLabel": "my production", /* for FILE resultType, the extension to use for the temp file */ "extension": "txt", /* for FILE resultType, the type of data stored in the temp file */ "mimeType": "text/plain", /* Macro roles define where this macro will appear in DSS GUI. They are used to pre-fill a macro parameter with context. Each role consists of: - type: where the macro will be shown * when selecting DSS object(s): DATASET, DATASETS, API_SERVICE, API_SERVICE_VERSION, BUNDLE, VISUAL_ANALYSIS, SAVED_MODEL, MANAGED_FOLDER * in the global project list: PROJECT_MACROS - targetParamsKey(s): name of the parameter(s) that will be filled with the selected object */ "macroRoles": [ { "type": "DATASET", "targetParamsKey": "dataset" } ] }
"""runnable.py""" import dataiku from dataiku.runnables import Runnable class MyRunnable(Runnable): """The base interface for a Python runnable""" def __init__(self, project_key, config, plugin_config): """ :param project_key: the project in which the runnable executes :param config: the dict of the configuration of the object :param plugin_config: contains the plugin settings """ self.project_key = project_key self.config = config self.plugin_config = plugin_config self._init_project() def _init_project(self): client = dataiku.api_client() self.project = client.get_project(self.project_key) def get_progress_target(self): """ If the runnable will return some progress info, have this function return a tuple of (target, unit) where unit is one of: SIZE, FILES, RECORDS, NONE """ return None def run(self, progress_callback): """ Do stuff here. Can return a string or raise an exception. The progress_callback is a function expecting 1 value: current progress """ # Load the input dataset. # You can reference this when creating the notebook contents. dataset_name = self.config['dataset'] mydataset = dataiku.Dataset(dataset_name, self.project_key) mydataset_df = mydataset.get_dataframe() # ******** CREATE YOUR NOTEBOOK CONTENT HERE ******** notebook_content = { ... } # Create a new notebook. notebook_name = self.config['notebook'] self.project.create_jupyter_notebook(notebook_name, notebook_content) return f'Created notebook: {notebook_name}'
Be sure to replace the ... in runnable.py with your notebook contents that you generated above.
We specified a macro role in runnable.json, which means that you can run the macro directly from a dataset like in the following screenshot:
Please let me know if you have any questions.
Thanks,
Zach M
Answers
-
Hi @weaam7
,I'm not sure what you mean when you say that you're creating a custom recipe.
Can you please elaborate on what type of recipe you're creating. For example, is it a Python code recipe? Or are you developing a custom plugin?Also, please provide some more information about why you wanted the recipe to output to a notebook. Notebooks aren't typically used as the output to a recipe, so there might be a better way to accomplish your goal.
Thanks,
Zach -
Hello @ZachM
,I am trying to develop a custom plugin using python, in which the input is a dataset specifying x, y.. etc. that will be used to generate a complete notebook of the chosen chart type.
This is the main idea, is there anything I need to clarify?
Thank you.
-
Ignacio_Toledo Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 415 Neuron
Hello @weaam7
,If I understand you correctly, you'd like to create as an output a jupyter notebook report? I think that the nbformat package can be used to that end, as shown in this gist:
https://gist.github.com/fperez/9716279
Maybe that can help you to start.
-
Hello @ZachM
,Big thanx! this is exactly what I want,
I have one last question, how can I use the inputs of my plugin (x, y, titles.. etc) within the macro in 'notebook_content' field?
-
Hello @Ignacio_Toledo
,I tried to use this library but I couldn't figure out why write() doesn't work when using it in the plugin script,
Thanks for your help!
-
Hello @ZachM
,Thanks, Everything is clear now and it worked just fine!