DSS Data Scientist quickstart - Predict and evaluate recipes have different outputs
Hi,
I'm following the DSS Data Scientist quickstart and I see that "Predictions" and "scored" datasets have completely different predictions values. Is this normal ?
I manually checked that they are using the same model (customized Random forest previously deployed) and that their inputs are identical (the only difference I can notice is columns ordering), therefore I cannot explain the difference between the outputs.
Thanks
Best Answer
-
taraku Dataiker, Alpha Tester, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Registered Posts: 53 Dataiker
Hi @adr
, welcome to the Dataiku Community!I can confirm there are a few differences. In the Data Scientist Quick Start, the Score recipe is used to predict new, unseen test data. Its inputs are the Random Forest model and the test dataset. It outputs the scored dataset. The scored dataset includes a prediction column.
The Evaluate recipe is used to evaluate the true performance of the deployed model. Its inputs are the Random Forest model and the to_assess_prepared dataset. The to_assess_prepared dataset includes the data for customers in the test dataset and their known classes. Like the Score recipe, there is a prediction column.
Comparing these two prediction columns, I find that there are less than 10 records where the value in the prediction column is different.
One thing that can account for this is different input datasets.
Related posts: You can visit Predict vs Evaluate recipe to participate in the discussion about the difference between the outcome of the Evaluate recipe and the Score/Predict recipes.
To go further: Dataiku provides two key features for selecting the best model for a particular use case, model comparisons and model evaluation stores. To try these features, visit How-To: Model Comparisons and Model Evaluation Stores.
Answers
-
Hi, thank you for your answer. Eventually, I am not able to reproduce the error. I continued the tutorial and re-runned the scenario multiple times and now I notice only small differences like in Predict recipe vs Evaluate recipe. Must be a mistake of mine, but hard to explain as, like you said, the only difference come from the input datasets, and they have identical values as to_assess_prepared derives from test.