Model feature column reduction
I have created a random forest model using the lasso regression feature reduction. This model took around 10 minutes to run. I need this model to run a lot faster and was wondering how to see what feature columns the lasso regression removed from the model, so I can remove these columns manually from my next model to achieve the same result, but a lot faster.
Any help would be appreciated
Answers
-
JordanB Dataiker, Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 297 Dataiker
Hi @bnichols
.,After models have been trained, you can check what features have been used by the model, and what features have been rejected under "Features".
You can also retrieve a list of all the selected columns using the DSS Python API by obtaining a handle to the ML task: https://developer.dataiku.com/latest/concepts-and-examples/ml.html#obtaining-a-handle-to-an-existing-ml-task.
Note, with feature reduction, not having the features as input will generally result in an error, for example, when scoring the dataset, you will receive an error of columns missing. And, if you were to retrain the model with new data a new subset of potential features may be selected the next time. If you want to freeze the variables used you would need to select the features desired and then retrain, but this wouldn't work if feature reduction is selected.