Custom Metric
Hey there, I want to include the following metric in the custom score function of the visual tools and it seems to be failing:
from scipy.stats import ks_2samp
from sklearn.metrics import make_scorer
def ks_stat(y, yhat):
"""
This function calculates the Kolgomorov KSStatistic
Params

y: listarray like
a list or an array of a binary or continuous variable.
y_hat: listarraylike
"""
return ks_2samp(yhat[y == 1], yhat[y != 1]).statistic
y_hat = clf.predict_proba(X_test)
ks_scorer = make_scorer(ks_stat, needs_proba=True)
Answers

Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,059 Neuron
Please post your code snippet using a code block (see icon </> in the toolbar). Can you please post the error you get?

TomWiley Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Registered Posts: 2 Dataiker
Hi!
I've had a look at the code, and I think i've got a solution for you:
from scipy.stats import ks_2samp def ks_stat(y, yhat): """ This function calculates the Kolgomorov KSStatistic Params  y: listarray like a list or an array of a binary or continuous variable. y_hat: listarraylike """ return ks_2samp(yhat[y == 1], yhat[y != 1]).statistic def score(y_valid, y_pred): """ Custom scoring function. Must return a float quantifying the estimator prediction quality.  y_valid is a pandas Series  y_pred is a numpy ndarray with shape:  (nb_records,) for regression problems and classification problems where 'needs probas' (see below) is false (for classification, the values are the numeric class indexes)  (nb_records, nb_classes) for classification problems where 'needs probas' is true """ return ks_stat(y_valid, y_pred)
This code snippet requires the `Needs Probability` setting to be Off. I've had a quick glance at the kolmogorovsmirnov (ks) metric, and this appears to be correct, but i'm not 100% sure here.
From what I can tell, the problem in the original metric code was that it didn't define a "score" method with the expected signature. In Dataiku DSS, we generally expect a score function with the following signature: (similar to the builtin scikitlearn score functions )
def score(y_valid, y_pred): ...
(We also except score functions with an optional `sample_weight` parameter, or an optional `X_valid` parameter, but in all cases the `y_valid` and `y_pred` are required).
The sklearn `make_scorer` function is not necessary in this context.
Let me know if you have any more questions!
Tom