Export a model to a jupyter notebook doesn't preserve the feature

davidmakovoz
davidmakovoz Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2022, Neuron 2023 Posts: 67 Neuron

I have a model opened within an Analysis and exported it to a jupyter notebook.

The model has one text feature that uses TF/IDF vectorization:

The model in the notebook is using TruncatedSVD/HashingVectorizer. This is the 'default' option in the model design page, i.e. the option gets selected when a text feature is added to a model:

But I changed that default option to TF/IDF vectorization as evident from the second image and trained the model.

I can modify the notebook and use tf-idf as designed.

But the question is whether it is possible to export a model the way it is designed?

Best Answer

Answers

  • davidmakovoz
    davidmakovoz Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2022, Neuron 2023 Posts: 67 Neuron
    Thank you for the quick response. In that case, maybe it would make sense to rename this option from "Export Model" to "Create a similar model"?
    The language "Export" is misleading. If I export Airbus A380 , I am expected to deliver Airbus A380 , not Airbus A340, A350, etc.
    Also, I don't think I'd agree with the statement " It is not possible to export the exact same model as the actual code might be much more complex than something that can fit in a human-editable notebook." I believe, the opposite is true, one can do much more and has more flexibility using a notebook than working with a predefined set of options of Visual Recipes. After all, all the options of Visual recipes were originally created by a human in a human-editable notebook (or some equivalent thereof), weren't they? :)
  • cperdigou
    cperdigou Alpha Tester, Dataiker Alumni Posts: 115 ✭✭✭✭✭✭✭
    Most of the resulting code gets generated, so even if the source was indeed written by an human, the output is clearly not human readable unfortunately.