Survey banner
Switching to Dataiku - a new area to help users who are transitioning from other tools and diving into Dataiku!

# Custom Metric

Level 2
###### Custom Metric

Hey there, I want to include the following metric in the custom score function of the visual tools and it seems to be failing:

from scipy.stats import ks_2samp
from sklearn.metrics import make_scorer

def ks_stat(y, yhat):
"""
This function calculates the Kolgomorov KS-Statistic
Params
------
y: list-array like
a list or an array of a binary or continuous variable.
y_hat: list-array-like
"""
return ks_2samp(yhat[y == 1], yhat[y != 1]).statistic

y_hat = clf.predict_proba(X_test)

ks_scorer = make_scorer(ks_stat, needs_proba=True)

2 Replies

Please post your code snippet using a code block (see icon </> in the toolbar). Can you please post the error you get?

Dataiker

Hi!

I've had a look at the code, and I think i've got a solution for you:

``````from scipy.stats import ks_2samp

def ks_stat(y, yhat):
"""
This function calculates the Kolgomorov KS-Statistic
Params
------
y: list-array like
a list or an array of a binary or continuous variable.
y_hat: list-array-like
"""
return ks_2samp(yhat[y == 1], yhat[y != 1]).statistic

def score(y_valid, y_pred):
"""
Custom scoring function.
Must return a float quantifying the estimator prediction quality.
- y_valid is a pandas Series
- y_pred is a numpy ndarray with shape:
- (nb_records,) for regression problems and classification problems
where 'needs probas' (see below) is false
(for classification, the values are the numeric class indexes)
- (nb_records, nb_classes) for classification problems where
'needs probas' is true
"""
return ks_stat(y_valid, y_pred)``````

This code snippet requires the `Needs Probability` setting to be Off. I've had a quick glance at the kolmogorov-smirnov (ks) metric, and this appears to be correct, but i'm not 100% sure here.

From what I can tell, the problem in the original metric code was that it didn't define a "score" method with the expected signature. In Dataiku DSS, we generally expect a score function with the following signature: (similar to the built-in scikit-learn score functions )

``````def score(y_valid, y_pred):
...``````

(We also except score functions with an optional `sample_weight` parameter, or an optional `X_valid` parameter, but in all cases the `y_valid` and `y_pred` are required).

The sklearn `make_scorer` function is not necessary in this context.

Let me know if you have any more questions!

Tom