Export a model to a jupyter notebook doesn't preserve the feature
I have a model opened within an Analysis and exported it to a jupyter notebook.
The model has one text feature that uses TF/IDF vectorization:
The model in the notebook is using TruncatedSVD/HashingVectorizer. This is the 'default' option in the model design page, i.e. the option gets selected when a text feature is added to a model:
But I changed that default option to TF/IDF vectorization as evident from the second image and trained the model.
I can modify the notebook and use tf-idf as designed.
But the question is whether it is possible to export a model the way it is designed?
Best Answer
-
Hi,
notebook generation to export a model only exports a "similar" model (documentation here: https://doc.dataiku.com/dss/latest/machine-learning/models-export.html#export-to-jupyter-notebook ). It is not possible to export the exact same model as the actual code might be much more complex than something that can fit in a human-editable notebook. The idea is to provide a good enough starting point that data scientists can actually build on.
Regards,
Joachim Zentici
Dataiku
Answers
-
davidmakovoz Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2022, Neuron 2023 Posts: 67 NeuronThank you for the quick response. In that case, maybe it would make sense to rename this option from "Export Model" to "Create a similar model"?
The language "Export" is misleading. If I export Airbus A380 , I am expected to deliver Airbus A380 , not Airbus A340, A350, etc.
Also, I don't think I'd agree with the statement " It is not possible to export the exact same model as the actual code might be much more complex than something that can fit in a human-editable notebook." I believe, the opposite is true, one can do much more and has more flexibility using a notebook than working with a predefined set of options of Visual Recipes. After all, all the options of Visual recipes were originally created by a human in a human-editable notebook (or some equivalent thereof), weren't they? -
Most of the resulting code gets generated, so even if the source was indeed written by an human, the output is clearly not human readable unfortunately.