Custom code metric
Hey, i want to code a custom metric for my Models.
I wanna code a kindof AUC metric with weights for each class.
Atm i have 8 classes, for wich i have these weights :
w8, w7, w6, w5, w4, w3, w2, w1 = 0.13, 0.25, 0.38, 0.50, 0.63, 0.75, 0.88, 1
My 8 classes names are C8, C7, C6, ..., C1. Their mappedvalue is 0 for C8, 1 for C7, ... 7 for C1.
So i made this code :
from sklearn.metrics import roc_auc_score
def score(y_valid, y_pred):
"""
Custom scoring function.
Must return a float quantifying the estimator prediction quality.
 y_valid is a pandas Series
 y_pred is a numpy ndarray with shape:
 (nb_records,) for regression problems and classification problems
where 'needs probas' (see below) is false
(for classification, the values are the numeric class indexes)
 (nb_records, nb_classes) for classification problems where
'needs probas' is true
 [optional] X_valid is a dataframe with shape (nb_records, nb_input_features)
 [optional] sample_weight is a numpy ndarray with shape (nb_records,)
NB: this option requires a variable set as "Sample weights"
"""
w8, w7, w6, w5, w4, w3, w2, w1 = 0.13, 0.25, 0.38, 0.50, 0.63, 0.75, 0.88, 1
c8 = w8 * roc_auc_score((y_valid==0), y_pred)
c7 = w7 * roc_auc_score((y_valid==1), y_pred)
c6 = w6 * roc_auc_score((y_valid==2), y_pred)
c5 = w5 * roc_auc_score((y_valid==3), y_pred)
c4 = w4 * roc_auc_score((y_valid==4), y_pred)
c3 = w3 * roc_auc_score((y_valid==5), y_pred)
c2 = w2 * roc_auc_score((y_valid==6), y_pred)
c1 = w1 * roc_auc_score((y_valid==7), y_pred)
return (c1+c2+c3+c4+c5+c6+c7+c8)
But it doesn't really work. I don't quite understand how the API works here
Can anyone help me out with that ?
Answers

Hi Fragan,
How are you computing these weights? The standard way in the literature to compute weighted AUC is to weight by the number of label instances in the train set. That is already a parameter in scikitlearn roc_auc_score function:
average = "weighted" instead of "macro" (default)
If you want to use a different weighting strategy, you may want to create a modified version of scikitlearn roc_auc_score function, which can be found here: https://github.com/scikitlearn/scikitlearn/blob/bddd9257f39f190fec3d72872cff73c2b3cc2734/sklearn/metrics/ranking.py#L244
In short, that would be a weighting over several calls to _binary_clf_curve (see link above).
Happy to elaborate more to pave the way!
Cheers,
Alex Combessie

I want to give high weight to the classes with lesser data so im affecting weight arbitrarily

Alright.
I have looked into the implications of this change in the current version of scikitlearn used by DSS 6.0 for the Visual ML interface. That version is scikitlearn 0.20.X.
Looking at the scikitlearn code, you can see that roc_auc_score (https://github.com/scikitlearn/scikitlearn/blob/bddd9257f39f190fec3d72872cff73c2b3cc2734/sklearn/metrics/ranking.py#L244) calls _average_binary_score (https://github.com/scikitlearn/scikitlearn/blob/bddd9257f39f190fec3d72872cff73c2b3cc2734/sklearn/metrics/base.py#L93).
_average_binary_score effectively uses an "average" parameter which controls the weighting.
So it should be good to simply customize this part with custom weights.
Hope it helps,
Alex Combessie