How does Dataiku calculate testing scores (with cross validation)?

Felix_R
Level 1
How does Dataiku calculate testing scores (with cross validation)?
  1. I split my data set (1000 obs) into train (800) and test (200). I trained 3 models with hyper-parameter search and 5-fold cross validation. I can see the result metric value in the โ€œvisual analysisโ€ - "result" tab (table view), i.e. R^2 = 0.7 (+/-0.3) Is that metric the average metrics across all folds and the +/- range corresponds to the range of the metric values in the cross validation? I found this article which seems to suggest dataiku automatically split my train data set into train and test again. In that case, I don't understand why there would be +/- in the result.
  2. Related to the first question, how to find training/validation/testing metrics in dataiku?
  3. Adiitionally, I have a question related to a random forest model I built for a regression problem: In my random forest model result โ€“ โ€œexplainabilityโ€ โ€“ โ€œvariable importanceโ€, I see โ€œfeature Aโ€ and โ€œfeature A (computed)โ€? Dataiku applied scaling on feature A, a numeric variable. Is the computed one feature A after scaling and Dataiku kept the original feature A column as well? Is there a way for me to see the actual dataset after feature engineering/selection done by Dataiku before it was fed into the training step?
0 Kudos
0 Replies