Feature Importance Drift from MES
Hey there! Can anyone point me to a reference that talks about the details of the feature importance drift calculation in the MES.
I understand the input data drift is given by a classification model which looks at "model testing data" (0) vs "new data" (1), and then an accuracy score is created. If the accuracy is above 50% then they thought is that the base feature set is different to the current one.
My question is how exactly is the feature importance calculated with the new dataset, in order to get a drift %. I think the base feature importance is just a pull from the original modeling.
Im having trouble to find a good reference. Guess I could dig into the original plugin git repo, but Im not sure the calculations are the same since that plugin was depreciated.
Any clarification would be appreciated!
CJ
Operating system used: Ubuntu
Best Answer
-
Hi! Thanks for the reply @JordanB
In cases where the drift model has an accuracy larger than 50%...
If none of the important features in the original model have importance in the drift model, then no worries. If the drift model says that features important in the original model have high importance in the drift, then you need to check that the model relevancy is still ok!
Thanks, CJ
Answers
-
JordanB Dataiker, Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 296 Dataiker
Hi @cmjurs
,In the feature drift importance plot, the importance of the categorical variable is a sum of the score of each value in the categorical variable. The plot shows how important each feature used in the trained model is in the drift detection model. On the x-axis is the original feature importance, the further to the right, the more the feature was important in the model. On the y-axis is the drift feature importance in the drift detection model. Look for the most drifted feature by looking for the features on the top right quadrant.
I hope this helps!
Thanks,
Jordan