Custom Metric

Options
josurriola
josurriola Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 5

Hey there, I want to include the following metric in the custom score function of the visual tools and it seems to be failing:

from scipy.stats import ks_2samp
from sklearn.metrics import make_scorer

def ks_stat(y, yhat):
"""
This function calculates the Kolgomorov KS-Statistic
Params
------
y: list-array like
a list or an array of a binary or continuous variable.
y_hat: list-array-like
"""
return ks_2samp(yhat[y == 1], yhat[y != 1]).statistic


y_hat = clf.predict_proba(X_test)

ks_scorer = make_scorer(ks_stat, needs_proba=True)

Answers

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 1,717 Neuron
    Options

    Please post your code snippet using a code block (see icon </> in the toolbar). Can you please post the error you get?

  • TomWiley
    TomWiley Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Registered Posts: 2 Dataiker
    edited 4:25PM
    Options

    Hi!

    I've had a look at the code, and I think i've got a solution for you:

    from scipy.stats import ks_2samp
          
    def ks_stat(y, yhat):
        """
        This function calculates the Kolgomorov KS-Statistic
        Params
        ------
        y: list-array like
        a list or an array of a binary or continuous variable.
        y_hat: list-array-like
        """
        return ks_2samp(yhat[y == 1], yhat[y != 1]).statistic
    
    def score(y_valid, y_pred):
        """
        Custom scoring function.
        Must return a float quantifying the estimator prediction quality.
        - y_valid is a pandas Series
        - y_pred is a numpy ndarray with shape:
            - (nb_records,) for regression problems and classification problems
                where 'needs probas' (see below) is false
                (for classification, the values are the numeric class indexes)
            - (nb_records, nb_classes) for classification problems where
                'needs probas' is true
        """
        return ks_stat(y_valid, y_pred)

    This code snippet requires the `Needs Probability` setting to be Off. I've had a quick glance at the kolmogorov-smirnov (ks) metric, and this appears to be correct, but i'm not 100% sure here.


    From what I can tell, the problem in the original metric code was that it didn't define a "score" method with the expected signature. In Dataiku DSS, we generally expect a score function with the following signature: (similar to the built-in scikit-learn score functions )

    def score(y_valid, y_pred):
        ...

    (We also except score functions with an optional `sample_weight` parameter, or an optional `X_valid` parameter, but in all cases the `y_valid` and `y_pred` are required).

    The sklearn `make_scorer` function is not necessary in this context.

    Let me know if you have any more questions!

    Tom

Setup Info
    Tags
      Help me…