Why are the probabilities different?

UserBird
UserBird Dataiker, Alpha Tester Posts: 535 Dataiker
Hi,

I created a simple model to predict a true/false variable and trained it with about 5000 records. The probabilities in the "proba_0" and "proba_1" columns on the Predicted Data tab have a normal, expected distribution. However, when I export the model to an iPython notebook and manually run the code, the probabilities have a rather extreme distribution with most of the values for "proba_0" being close to 0.99.

My question is: is the python code created in the iPython notebook the code that is actually used to create the model in DSS or is it simply a rough translation of that code? Why would I be seeing such different numbers here?

Answers

  • Clément_Stenac
    Clément_Stenac Dataiker, Dataiku DSS Core Designer, Registered Posts: 753 Dataiker
    Hi,

    You are correct that the Jupyter notebook export is a rough translation of how the model is created, aimed at education and scaffolding purposes, which does not aim at being an exact replica of the model trained in the visual interface.

    We will make this fact clearer in the next release.
  • UserBird
    UserBird Dataiker, Alpha Tester Posts: 535 Dataiker
    Thank you for the quick response. Glad to know they are different.

    Can you offer any guidance about how to achieve the probability distribution on the Predicted Data tab with the python in the Jupyter notebook? Or can you possibly share what technologies are used to predict the probabilities on the Predicted Data tab?
Setup Info
    Tags
      Help me…