Creating output datasets through R notebook and re-usability of R notebooks in general.
I have the following scenario/use case in DSS with R, for which I have some questions (I have an R notebook in which data aggregation and visualization code has already been created):
1. Colleagues upload their data,
2. Make minor modifications to the generic R notebook,
3. Save/export the aggregations and visualizations on their local machines.
I would like to know if there is a way to create output data sets directly through a R notebook in DSS through any available command. Currently, I understand that during creation of a R recipe in DSS, the output datasets are usually specified beforehand (but populated through R code), but I am not sure how can this functionality can be done using an R notebook.
Regarding re-usability, I would like to know on how can I share the R notebook with other colleagues? Eg: Do I need to share the project flow which would in turn make the R notebook accessible to them? or anything else?
Also, can a R notebook be created and executed independently without an R recipe in the DSS project/flow? I have both a R recipe and R notebook which contains almost the same code, but would like to switch to R notebook completely (as it offers interactivity over a recipe) if possible.
Thanks in advance and look forward to hear your views/suggestions.
Regards,
Aditya.
Answers
-
I would like to know if there is a way to create output data sets directly through a R notebook
You will not be able to create the dataset through the R API. However, just like a recipe does, you can write the content of any existing dataset.
share the R notebook with other colleagues
You can provide notebook templates, typically you would do this through a plugin, here is the documentation about this component. https://doc.dataiku.com/dss/latest/notebooks/predefined-notebooks.html#creating-your-own-prebuilt-templates
Also, can a R notebook be created and executed independently without an R recipe in the DSS project/flow
When creating an R recipe, you can convert it to a notebook, and vice versa. You cannot have a notebook as a step in the flow. You can run a notebook and do whatever you want, but this will not appear in the flow, so it's not great for reproducibility and auditing.