Discover all of the brand-new features and improvements to existing capabilities in the Dataiku 11.3 updateLET'S GO

Comparing models with test data scores and selecting features

nsrishan
Level 2
Comparing models with test data scores and selecting features

Hello,

I am currently running a flow to do a binary classification model. After running the model on my training data, I want to compare the results of the top three models on my test data set (accuracy, precision, recall, etc.). I know how to do it on the train dataset, but I am unsure on how to do it on the test data and compare it via model comparisons. 

Also, after running my flow on the entire set of features, is there a way to only select the top 5 features to run a new model on?

Thank you!

0 Kudos
1 Reply
AlexT
Dataiker

Hi,

To evaluate on the test dataset you would need to perform the split using the split recipe and then use explicit extracts for your train/test sets. 

You can do this Visual Analysis > select the model > Design > Train/Test Set and choose "Explicit extracts from two datasets"

Screenshot 2022-08-15 at 10.41.11.png

To reduce the number of features you can have a look at: https://doc.dataiku.com/dss/latest/machine-learning/supervised/settings.html#settings-feature-reduct... 

Let me know if that helps. 

0 Kudos