Custom code metric

Fragan
Level 1
Custom code metric

 

Hey, i want to code a custom metric for my Models.

I wanna code a kindof AUC metric with weights for each class.

Atm i have 8 classes, for wich i have these weights :

w8, w7, w6, w5, w4, w3, w2, w1 = 0.13, 0.25, 0.38, 0.50, 0.63, 0.75, 0.88, 1

 

My 8 classes names are C8, C7, C6, ..., C1. Their mappedvalue is 0 for C8, 1 for C7, ... 7 for C1.

So i made this code :

from sklearn.metrics import roc_auc_score

def score(y_valid, y_pred):
"""
Custom scoring function.
Must return a float quantifying the estimator prediction quality.
- y_valid is a pandas Series
- y_pred is a numpy ndarray with shape:
- (nb_records,) for regression problems and classification problems
where 'needs probas' (see below) is false
(for classification, the values are the numeric class indexes)
- (nb_records, nb_classes) for classification problems where
'needs probas' is true
- [optional] X_valid is a dataframe with shape (nb_records, nb_input_features)
- [optional] sample_weight is a numpy ndarray with shape (nb_records,)
NB: this option requires a variable set as "Sample weights"
"""

w8, w7, w6, w5, w4, w3, w2, w1 = 0.13, 0.25, 0.38, 0.50, 0.63, 0.75, 0.88, 1

c8 = w8 * roc_auc_score((y_valid==0), y_pred)
c7 = w7 * roc_auc_score((y_valid==1), y_pred)
c6 = w6 * roc_auc_score((y_valid==2), y_pred)
c5 = w5 * roc_auc_score((y_valid==3), y_pred)
c4 = w4 * roc_auc_score((y_valid==4), y_pred)
c3 = w3 * roc_auc_score((y_valid==5), y_pred)
c2 = w2 * roc_auc_score((y_valid==6), y_pred)
c1 = w1 * roc_auc_score((y_valid==7), y_pred)

return (c1+c2+c3+c4+c5+c6+c7+c8)

 

But it doesn't really work. I don't quite understand how the API works here

 

Can anyone help me out with that ?

3 Replies
Alex_Combessie
Dataiker Alumni

Hi Fragan,

How are you computing these weights? The standard way in the literature to compute weighted AUC is to weight by the number of label instances in the train set. That is already a parameter in scikit-learn roc_auc_score function:

average = "weighted" instead of "macro" (default)

If you want to use a different weighting strategy, you may want to create a modified version of scikit-learn roc_auc_score function, which can be found here: https://github.com/scikit-learn/scikit-learn/blob/bddd9257f39f190fec3d72872cff73c2b3cc2734/sklearn/m...

In short, that would be a weighting over several calls to _binary_clf_curve (see link above).

Happy to elaborate more to pave the way!

Cheers,

Alex Combessie

0 Kudos
Fragan
Level 1
Author

I want to give high weight to the classes with lesser data so im affecting weight arbitrarily 

0 Kudos
Alex_Combessie
Dataiker Alumni

Alright.

I have looked into the implications of this change in the current version of scikit-learn used by DSS 6.0 for the Visual ML interface. That version is scikit-learn 0.20.X.

Looking at the scikit-learn code, you can see that roc_auc_score (https://github.com/scikit-learn/scikit-learn/blob/bddd9257f39f190fec3d72872cff73c2b3cc2734/sklearn/m...) calls _average_binary_score (https://github.com/scikit-learn/scikit-learn/blob/bddd9257f39f190fec3d72872cff73c2b3cc2734/sklearn/m...).

_average_binary_score effectively uses an "average" parameter which controls the weighting.

So it should be good to  simply customize this part with custom weights.

Hope it helps,

Alex Combessie

 

 

0 Kudos