Comparing models with test data scores and selecting features

nsrishan · July 2022

Hello,

I am currently running a flow to do a binary classification model. After running the model on my training data, I want to compare the results of the top three models on my test data set (accuracy, precision, recall, etc.). I know how to do it on the train dataset, but I am unsure on how to do it on the test data and compare it via model comparisons.

Also, after running my flow on the entire set of features, is there a way to only select the top 5 features to run a new model on?

Thank you!

Alexandru · August 2022

Hi,

To evaluate on the test dataset you would need to perform the split using the split recipe and then use explicit extracts for your train/test sets.

You can do this Visual Analysis > select the model > Design > Train/Test Set and choose "Explicit extracts from two datasets"

Screenshot 2022-08-15 at 10.41.11.png

To reduce the number of features you can have a look at: https://doc.dataiku.com/dss/latest/machine-learning/supervised/settings.html#settings-feature-reduction

Let me know if that helps.

sir · March 2023

Hi,

But where do we see the results of the test dataset after splitting? How do we get the recall, precision, accuracy etc?

Comparing models with test data scores and selecting features

Answers

Categories

Setup Info

Tags