Custom Metric
Hey there, I want to include the following metric in the custom score function of the visual tools and it seems to be failing:
from scipy.stats import ks_2samp
from sklearn.metrics import make_scorer
def ks_stat(y, yhat):
"""
This function calculates the Kolgomorov KS-Statistic
Params
------
y: list-array like
a list or an array of a binary or continuous variable.
y_hat: list-array-like
"""
return ks_2samp(yhat[y == 1], yhat[y != 1]).statistic
y_hat = clf.predict_proba(X_test)
ks_scorer = make_scorer(ks_stat, needs_proba=True)
Answers
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,166 Neuron
Please post your code snippet using a code block (see icon </> in the toolbar). Can you please post the error you get?
-
TomWiley Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Registered Posts: 2 Dataiker
Hi!
I've had a look at the code, and I think i've got a solution for you:
from scipy.stats import ks_2samp def ks_stat(y, yhat): """ This function calculates the Kolgomorov KS-Statistic Params ------ y: list-array like a list or an array of a binary or continuous variable. y_hat: list-array-like """ return ks_2samp(yhat[y == 1], yhat[y != 1]).statistic def score(y_valid, y_pred): """ Custom scoring function. Must return a float quantifying the estimator prediction quality. - y_valid is a pandas Series - y_pred is a numpy ndarray with shape: - (nb_records,) for regression problems and classification problems where 'needs probas' (see below) is false (for classification, the values are the numeric class indexes) - (nb_records, nb_classes) for classification problems where 'needs probas' is true """ return ks_stat(y_valid, y_pred)
This code snippet requires the `Needs Probability` setting to be Off. I've had a quick glance at the kolmogorov-smirnov (ks) metric, and this appears to be correct, but i'm not 100% sure here.
From what I can tell, the problem in the original metric code was that it didn't define a "score" method with the expected signature. In Dataiku DSS, we generally expect a score function with the following signature: (similar to the built-in scikit-learn score functions )
def score(y_valid, y_pred): ...
(We also except score functions with an optional `sample_weight` parameter, or an optional `X_valid` parameter, but in all cases the `y_valid` and `y_pred` are required).
The sklearn `make_scorer` function is not necessary in this context.
Let me know if you have any more questions!
Tom