Survey banner
The Dataiku Community is moving to a new home! Some short term disruption starting next week: LEARN MORE

Custom Metric

josurriola
Level 2
Custom Metric

Hey there, I want to include the following metric in the custom score function of the visual tools and it seems to be failing: 

from scipy.stats import ks_2samp
from sklearn.metrics import make_scorer

def ks_stat(y, yhat):
"""
This function calculates the Kolgomorov KS-Statistic
Params
------
y: list-array like
a list or an array of a binary or continuous variable.
y_hat: list-array-like
"""
return ks_2samp(yhat[y == 1], yhat[y != 1]).statistic


y_hat = clf.predict_proba(X_test)

ks_scorer = make_scorer(ks_stat, needs_proba=True)

0 Kudos
2 Replies
Turribeach

Please post your code snippet using a code block (see icon </> in the toolbar). Can you please post the error you get?

0 Kudos
TomWiley
Dataiker

Hi!

I've had a look at the code, and I think i've got a solution for you:

 

from scipy.stats import ks_2samp
      
def ks_stat(y, yhat):
    """
    This function calculates the Kolgomorov KS-Statistic
    Params
    ------
    y: list-array like
    a list or an array of a binary or continuous variable.
    y_hat: list-array-like
    """
    return ks_2samp(yhat[y == 1], yhat[y != 1]).statistic

def score(y_valid, y_pred):
    """
    Custom scoring function.
    Must return a float quantifying the estimator prediction quality.
    - y_valid is a pandas Series
    - y_pred is a numpy ndarray with shape:
        - (nb_records,) for regression problems and classification problems
            where 'needs probas' (see below) is false
            (for classification, the values are the numeric class indexes)
        - (nb_records, nb_classes) for classification problems where
            'needs probas' is true
    """
    return ks_stat(y_valid, y_pred)

 

This code snippet requires the `Needs Probability` setting to be Off. I've had a quick glance at the kolmogorov-smirnov (ks) metric, and this appears to be correct, but i'm not 100% sure here.


From what I can tell, the problem in the original metric code was that it didn't define a "score" method with the expected signature. In Dataiku DSS, we generally expect a score function with the following signature: (similar to the built-in scikit-learn score functions )

 

def score(y_valid, y_pred):
    ...

 

(We also except score functions with an optional `sample_weight` parameter, or an optional `X_valid` parameter, but in all cases the `y_valid` and `y_pred` are required).

The sklearn `make_scorer` function is not necessary in this context.

Let me know if you have any more questions!

Tom

0 Kudos