Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Hey there, I want to include the following metric in the custom score function of the visual tools and it seems to be failing:
from scipy.stats import ks_2samp
from sklearn.metrics import make_scorer
def ks_stat(y, yhat):
"""
This function calculates the Kolgomorov KS-Statistic
Params
------
y: list-array like
a list or an array of a binary or continuous variable.
y_hat: list-array-like
"""
return ks_2samp(yhat[y == 1], yhat[y != 1]).statistic
y_hat = clf.predict_proba(X_test)
ks_scorer = make_scorer(ks_stat, needs_proba=True)
Please post your code snippet using a code block (see icon </> in the toolbar). Can you please post the error you get?
Hi!
I've had a look at the code, and I think i've got a solution for you:
from scipy.stats import ks_2samp
def ks_stat(y, yhat):
"""
This function calculates the Kolgomorov KS-Statistic
Params
------
y: list-array like
a list or an array of a binary or continuous variable.
y_hat: list-array-like
"""
return ks_2samp(yhat[y == 1], yhat[y != 1]).statistic
def score(y_valid, y_pred):
"""
Custom scoring function.
Must return a float quantifying the estimator prediction quality.
- y_valid is a pandas Series
- y_pred is a numpy ndarray with shape:
- (nb_records,) for regression problems and classification problems
where 'needs probas' (see below) is false
(for classification, the values are the numeric class indexes)
- (nb_records, nb_classes) for classification problems where
'needs probas' is true
"""
return ks_stat(y_valid, y_pred)
This code snippet requires the `Needs Probability` setting to be Off. I've had a quick glance at the kolmogorov-smirnov (ks) metric, and this appears to be correct, but i'm not 100% sure here.
From what I can tell, the problem in the original metric code was that it didn't define a "score" method with the expected signature. In Dataiku DSS, we generally expect a score function with the following signature: (similar to the built-in scikit-learn score functions )
def score(y_valid, y_pred):
...
(We also except score functions with an optional `sample_weight` parameter, or an optional `X_valid` parameter, but in all cases the `y_valid` and `y_pred` are required).
The sklearn `make_scorer` function is not necessary in this context.
Let me know if you have any more questions!
Tom