How does Dataiku calculate testing scores (with cross validation)?
Felix_R
Dataiku DSS Core Designer, Registered Posts: 1 ✭✭
- I split my data set (1000 obs) into train (800) and test (200). I trained 3 models with hyper-parameter search and 5-fold cross validation. I can see the result metric value in the “visual analysis” - "result" tab (table view), i.e. R^2 = 0.7 (+/-0.3) Is that metric the average metrics across all folds and the +/- range corresponds to the range of the metric values in the cross validation? I found this article which seems to suggest dataiku automatically split my train data set into train and test again. In that case, I don't understand why there would be +/- in the result.
- Related to the first question, how to find training/validation/testing metrics in dataiku?
- Adiitionally, I have a question related to a random forest model I built for a regression problem: In my random forest model result – “explainability” – “variable importance”, I see “feature A” and “feature A (computed)”? Dataiku applied scaling on feature A, a numeric variable. Is the computed one feature A after scaling and Dataiku kept the original feature A column as well? Is there a way for me to see the actual dataset after feature engineering/selection done by Dataiku before it was fed into the training step?