## Sign up to take part

Registered users can ask their own questions, contribute to discussions, and be part of the Community!

This website uses cookies. By clicking OK, you consent to the use of cookies. Read our cookie policy.

Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Registered users can ask their own questions, contribute to discussions, and be part of the Community!

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Custom code metric

Hey, i want to code a custom metric for my Models.

I wanna code a kindof AUC metric with weights for each class.

Atm i have 8 classes, for wich i have these weights :

w8, w7, w6, w5, w4, w3, w2, w1 = 0.13, 0.25, 0.38, 0.50, 0.63, 0.75, 0.88, 1

My 8 classes names are C8, C7, C6, ..., C1. Their mappedvalue is 0 for C8, 1 for C7, ... 7 for C1.

So i made this code :

from sklearn.metrics import roc_auc_score

def score(y_valid, y_pred):

"""

Custom scoring function.

Must return a float quantifying the estimator prediction quality.

- y_valid is a pandas Series

- y_pred is a numpy ndarray with shape:

- (nb_records,) for regression problems and classification problems

where 'needs probas' (see below) is false

(for classification, the values are the numeric class indexes)

- (nb_records, nb_classes) for classification problems where

'needs probas' is true

- [optional] X_valid is a dataframe with shape (nb_records, nb_input_features)

- [optional] sample_weight is a numpy ndarray with shape (nb_records,)

NB: this option requires a variable set as "Sample weights"

"""

w8, w7, w6, w5, w4, w3, w2, w1 = 0.13, 0.25, 0.38, 0.50, 0.63, 0.75, 0.88, 1

c8 = w8 * roc_auc_score((y_valid==0), y_pred)

c7 = w7 * roc_auc_score((y_valid==1), y_pred)

c6 = w6 * roc_auc_score((y_valid==2), y_pred)

c5 = w5 * roc_auc_score((y_valid==3), y_pred)

c4 = w4 * roc_auc_score((y_valid==4), y_pred)

c3 = w3 * roc_auc_score((y_valid==5), y_pred)

c2 = w2 * roc_auc_score((y_valid==6), y_pred)

c1 = w1 * roc_auc_score((y_valid==7), y_pred)

return (c1+c2+c3+c4+c5+c6+c7+c8)

But it doesn't really work. I don't quite understand how the API works here

Can anyone help me out with that ?

Solutions shown first - Read whole discussion

3 Replies

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Hi Fragan,

How are you computing these weights? The standard way in the literature to compute weighted AUC is to weight by the number of label instances in the train set. That is already a parameter in scikit-learn roc_auc_score function:

average = "weighted" instead of "macro" (default)

If you want to use a different weighting strategy, you may want to create a modified version of scikit-learn roc_auc_score function, which can be found here: https://github.com/scikit-learn/scikit-learn/blob/bddd9257f39f190fec3d72872cff73c2b3cc2734/sklearn/m...

In short, that would be a weighting over several calls to _binary_clf_curve (see link above).

Happy to elaborate more to pave the way!

Cheers,

Alex Combessie

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

I want to give high weight to the classes with lesser data so im affecting weight arbitrarily

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Alright.

I have looked into the implications of this change in the current version of scikit-learn used by DSS 6.0 for the Visual ML interface. That version is scikit-learn 0.20.X.

Looking at the scikit-learn code, you can see that **roc_auc_score** (https://github.com/scikit-learn/scikit-learn/blob/bddd9257f39f190fec3d72872cff73c2b3cc2734/sklearn/m...) calls **_average_binary_score** (https://github.com/scikit-learn/scikit-learn/blob/bddd9257f39f190fec3d72872cff73c2b3cc2734/sklearn/m...).

**_average_binary_score** effectively uses an "average" parameter which controls the weighting.

So it should be good to simply customize this part with custom weights.

Hope it helps,

Alex Combessie