Discover all of the brand-new features and improvements to existing capabilities in the Dataiku 11.3 updateLET'S GO

Feature reduction, hyperpameters optimization and cross-valisation in the LAB

aflatoun
Level 1
Feature reduction, hyperpameters optimization and cross-valisation in the LAB

Hi there,

Here is my simplified use case:

Suppose I want to train a RandomForest to perform a classification task. I want to test two different models (one RF with 100 trees, another one with 500 trees), but also three different feature reduction methods (let's say LASSO regression with regularization=0.1, LASSO regression with regularization=0.01 and tree based with the default values).

I'm using explicit extracts in the "train/test set" tab and 5-folds CV in the "hyperparameters" tab. Knowing that I can only choose a unique feature reduction method in the "feature reduction" tab, I've designed two sessions - one for LASSO reduction and another one for RF reduction - where all the remaining parameters are the same (same train test split, same CV strategy with the same random state, same algorithms and hyperparameters).

My question is: are the feature reduction and models training conducted jointly or sequentially?

In other words, is the feature reduction step considered as an "additional hyperparameter" - meaning that for each iteration of the CV the feature reduction step reconducted - or is it performed once for all?

If it is performed once for all, what is the followed strategy to discrimine between the different values for the regularization hyperparameter of the LASSO regression? How to formally compare two different feature reduction methods since it is not possible to select more than one in a single session.

Thank you !

0 Kudos
0 Replies

Labels

?
Labels (1)

Setup info

?
A banner prompting to get Dataiku