Difference between 'evaluate' recipe and python functions

tamarapepping Registered Posts: 2 ✭✭✭✭

I have a question about the metric output of the 'evaluate' recipe. I created a RF model and then I make a prediction on 'new data'. Then I use the 'evaluate' recipe to create the extra 'prediction_correct' column and to get the output file with the metrics. In this file you can find the accuracy, precision, recall, auc etc. The scores were much higher then expected and when I calculated the accuracy etc in a Jupyter Notebook the scores where completely different. What am I doing wrong?

Best Answer


  • matthias_funke
    matthias_funke Dataiker Alumni Posts: 12 ✭✭✭✭✭

    Hi Tamara, could you explain what exactly you did when you created the model, and how you used Jupyter in this context?

    I want to make you sure you used the same model and the same input data.

    You may even want to post your code here, if it does not contain anything too sensitive.

  • tamarapepping
    tamarapepping Registered Posts: 2 ✭✭✭✭
    I will try to give an overview of the steps I took. However the data is sensitive, so I cannot post any screenshots or code.
    - I created a model (RF) in the analyses-menu
    - I deployed the model to the flow (I selected a train set and gave the model a name)
    - Then I go to the dataset I want to predict on, click on the dataset and select the 'predict' recipe
    - I choose the input dataset (the dataset I want to predict on) and then select the name of the model. The recipe is created
    - Then I click on the scored dataset, and use the 'evaluate' recipe. I select the model and I use the scored dataset as an input. There is no difference if I select the scored dataset or the orginal dataset here.
    - The 'evaluate' recipe shows two datasets, one containing the metrics (recall, accuracy, etc).
    - Against our expectations, these metrics were quite high. So I investigated the other dataset that the evaluate recipe gives. I loaded this dataset in the Jupyter Notebook and used the 'recall_score(), precision_score() etc from sklearn. The scores are then different from the metrics. This is also the case if I export the file to excel and calculate the confusionmatrix there.

    I hope you can help me :)
  • Clément_Stenac
    Clément_Stenac Dataiker, Dataiku DSS Core Designer, Registered Posts: 753 Dataiker

    The bug is now fixed in DSS 4.1.3. Thanks for reporting this !
Setup Info
      Help me…