Interpretation of model performance in Dataiku built-in models

Povilas
Level 2
Interpretation of model performance in Dataiku built-in models
Hi!

I created a model using built-in Dataiku models. However, results are quite suspicious. So I would like to ask you some questions.

In the attached screenshot you can see that the model I created is a Decision tree using 10 fold CV. This model was created only for testing purposes so I intentionally set tree max depth to 100. This makes the tree very deep (I could see that in the Interpretation part). Also this model should be overfitting a lot. It should perform very well on train set and very bad on test set. In case of cross validation it should have a bad performance as well because we should be evaluating results on each untrained fold. However, we can see 0.892 AUC here. Can you explain why we get this kind of performance which is obviously not right for this model? And on which data exactly this ROC AUC in the centre is calculated?



Povilas
0 Kudos
3 Replies
Clรฉment_Stenac
The resulting metrics is the average of the metric on each of the 10 folds (each time on the untrained fold).

What kind of performance are you getting:
* Without K-Fold ?
* On a reasonably-sized decision tree ?
* On a reasonable-sized random forest ?
0 Kudos
Povilas
Level 2
Author
* Without k-fold (simple train-test split) I get very similar performance. Again it is a very deep tree and I guess it is not a performance on test set but on train set
* Decision tree with max depth = 5 gives 0.64 AUC
* Typical Random Forrest gives 0.95 (!)

I tried same dataset using code for XGBoost and GBM. It gives not more than 0.75 AUC using 10 fold CV
0 Kudos
Clรฉment_Stenac
We confirm that all performance metrics shown in DSS are based on the test set - we currently never show performance on the train test - in the case of KFold, it's the mean of out-of-fold (so test set too)

So you do see a downwards trend when going from "reasonable" to "very deep" random forest (from 0.95 to 0.892) which is indeed probably indicative of overfitting, although it's not as severe as you expected - possibly because: (a) Your train and test sets are very similar (b) The random picking of features adds sufficient diversity to counteract parts of the overfitting effect - it could also happen if you don't have that much data, which means that your trees are not "full"
0 Kudos

Labels

?
Labels (2)
A banner prompting to get Dataiku