Comparing models with test data scores and selecting features
Hello,
I am currently running a flow to do a binary classification model. After running the model on my training data, I want to compare the results of the top three models on my test data set (accuracy, precision, recall, etc.). I know how to do it on the train dataset, but I am unsure on how to do it on the test data and compare it via model comparisons.
Also, after running my flow on the entire set of features, is there a way to only select the top 5 features to run a new model on?
Thank you!
Answers
-
Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,225 Dataiker
Hi,
To evaluate on the test dataset you would need to perform the split using the split recipe and then use explicit extracts for your train/test sets.
You can do this Visual Analysis > select the model > Design > Train/Test Set and choose "Explicit extracts from two datasets"
To reduce the number of features you can have a look at: https://doc.dataiku.com/dss/latest/machine-learning/supervised/settings.html#settings-feature-reduction
Let me know if that helps.
-
Hi,
But where do we see the results of the test dataset after splitting? How do we get the recall, precision, accuracy etc?