Survey banner
The Dataiku Community is moving to a new home! New posts are now disabled and the community will shortly be in temporary read only mode: LEARN MORE

Comparing models with test data scores and selecting features

nsrishan
Level 2
Comparing models with test data scores and selecting features

Hello,

I am currently running a flow to do a binary classification model. After running the model on my training data, I want to compare the results of the top three models on my test data set (accuracy, precision, recall, etc.). I know how to do it on the train dataset, but I am unsure on how to do it on the test data and compare it via model comparisons. 

Also, after running my flow on the entire set of features, is there a way to only select the top 5 features to run a new model on?

Thank you!

0 Kudos
2 Replies
AlexT
Dataiker

Hi,

To evaluate on the test dataset you would need to perform the split using the split recipe and then use explicit extracts for your train/test sets. 

You can do this Visual Analysis > select the model > Design > Train/Test Set and choose "Explicit extracts from two datasets"

Screenshot 2022-08-15 at 10.41.11.png

To reduce the number of features you can have a look at: https://doc.dataiku.com/dss/latest/machine-learning/supervised/settings.html#settings-feature-reduct... 

Let me know if that helps. 

0 Kudos
sir
Level 1

Hi,

But where do we see the results of the test dataset after splitting? How do we get the recall, precision, accuracy etc?

0 Kudos