Hyperparameter optimization and model evaluation failing
Hi there,
I am trying to do a custom Hyperparameter optimization and model evaluation. I tried using one of the code samples provided (AUC-PR) to see how it works:
from sklearn.metrics import precision_recall_fscore_support def f_beta_score(y_true, y_pred, sample_weight=None, beta=1.0): """ Custom scoring function using F-beta score. Must return a float quantifying the estimator prediction quality. - y_true is a numpy ndarray or pandas Series with true labels - y_pred is a numpy ndarray with predicted probabilities or class predictions - sample_weight is a numpy ndarray with shape (nb_records,) representing sample weights - beta is the beta parameter for F-beta score """ # Convert probabilities to class predictions (binary classification) y_pred_class = (y_pred[:, 1] > 0.5).astype(int) # Calculate precision, recall, and F-beta score precision, recall, f_beta, _ = precision_recall_fscore_support(y_true, y_pred_class, beta=beta, average='binary', sample_weight=sample_weight) return f_beta
This is leading to errros for all models: "Failed to train : <class 'ValueError'> : Custom scoring function failed: too many indices for array "
I would like to get this to work and I actually would like to get the below to work in my existing problem. How can I adjust this to work?
from sklearn.metrics import auc, precision_recall_curve def score(y_valid, y_pred): """ Custom scoring function. Must return a float quantifying the estimator prediction quality. - y_valid is a pandas Series - y_pred is a numpy ndarray with shape: - (nb_records,) for regression problems and classification problems where 'needs probas' (see below) is false (for classification, the values are the numeric class indexes) - (nb_records, nb_classes) for classification problems where 'needs probas' is true - [optional] X_valid is a dataframe with shape (nb_records, nb_input_features) - [optional] sample_weight is a numpy ndarray with shape (nb_records,) NB: this option requires a variable set as "Sample weights" """ # Data to plot precision-recall curve precision, recall, thresholds = precision_recall_curve(y_valid, y_pred[:,1]) # Use AUC function to calculate the area under the precision-recall curve auc_precision_recall = auc(recall, precision) return auc_precision_recall
Answers
-
TomWiley Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Registered Posts: 2 Dataiker
Hi!
For the AUC-PR code sample, have you enabled the `Needs Probability` option in the custom metric options? It should look something like in the screenshot i've attached, and it is necessary for this specific example.
For the F-Beta metric you've described, the following should do the trick:
from sklearn.metrics import precision_recall_fscore_support def score(y_true, y_pred, sample_weight=None): """ Custom scoring function using F-beta score. Must return a float quantifying the estimator prediction quality. - y_true is a numpy ndarray or pandas Series with true labels - y_pred is a numpy ndarray with predicted probabilities or class predictions - sample_weight is a numpy ndarray with shape (nb_records,) representing sample weights - beta is the beta parameter for F-beta score """ # Convert probabilities to class predictions (binary classification) beta=1.0 y_pred_class = (y_pred[:, 1] > 0.5).astype(int) # Calculate precision, recall, and F-beta score precision, recall, f_beta, _ = precision_recall_fscore_support(y_true, y_pred_class, beta=beta, average='binary', sample_weight=sample_weight) return f_beta
What i've done here is modify your function just so that its signature matches what we expect.
Again, this will require the `Needs Probability` option set to true.
Let me know if this helps!
Tom