Community Conundrum 25: Feature Visualization is now live! Read More

Using a dataset column as a custom metric

Level 1
Using a dataset column as a custom metric


I'm building a model to estimate betting exchange trades, and as part of the scoring metric, I want to specify the amount paid in the trade as the gain for the cost matrix, and the cost of the trade (if it loses, i.e falls below buying price) as the negative gain (-1 in this case, assuming $1 per trade).

For example, if I "bet" on an outcome for $1, and it pays $5, I want to use $5 as the "correct prediction" gain, as opposed to using a fixed value. Likewise, the next row in the dataset might pay $15, and so that should be the gain. A losing trade would have a gain of -1. The "amount paid" (in the case of a correct prediction) is available as a field in the dataset.

Is there a code sample that demonstrates how this can be done as a custom scoring function? From what I understand, this is the method I need to flesh out:

def score(y_valid, y_pred):
    Custom scoring function.
    Must return a float quantifying the estimator prediction quality.
      - y_valid is a pandas Series
      - y_pred is a numpy ndarray with shape:
           - (nb_records,) for regression problems and classification problems
             where 'needs probas' (see below) is false
             (for classification, the values are the numeric class indexes)
           - (nb_records, nb_classes) for classification problems where
             'needs probas' is true
      - [optional] X_valid is a dataframe with shape (nb_records, nb_input_features)
      - [optional] sample_weight is a numpy ndarray with shape (nb_records,)
                   NB: this option requires a variable set as "Sample weights"


1 Reply


Thanks for the description of this interesting use case. There is no built-in code sample to do this, but the idea is fairly simple to implement:

1. Use y_valid and y_pred to detect winning/losing trades

2. Use the optional parameter X_valid to retrieve the "amount paid" in case of a correct prediction

3. Combine 1. and 2. to compute an array of gains per trade (I assume that one row = one trade)

4. Aggregate it into sum/average of gains for all trades

Hope it helps,