Discover this year's submissions to the Dataiku Frontrunner Awards and give kudos to your favorite use cases and success stories!READ MORE

Comparing models with test data scores and selecting features

nsrishan
Level 2
Comparing models with test data scores and selecting features

Hello,

I am currently running a flow to do a binary classification model. After running the model on my training data, I want to compare the results of the top three models on my test data set (accuracy, precision, recall, etc.). I know how to do it on the train dataset, but I am unsure on how to do it on the test data and compare it via model comparisons. 

Also, after running my flow on the entire set of features, is there a way to only select the top 5 features to run a new model on?

Thank you!

0 Kudos
1 Reply
AlexT
Dataiker
Dataiker

Hi,

To evaluate on the test dataset you would need to perform the split using the split recipe and then use explicit extracts for your train/test sets. 

You can do this Visual Analysis > select the model > Design > Train/Test Set and choose "Explicit extracts from two datasets"

Screenshot 2022-08-15 at 10.41.11.png

To reduce the number of features you can have a look at: https://doc.dataiku.com/dss/latest/machine-learning/supervised/settings.html#settings-feature-reduct... 

Let me know if that helps. 

0 Kudos