Hyperparameter optimization and model evaluation failing
Hi there,
I am trying to do a custom Hyperparameter optimization and model evaluation. I tried using one of the code samples provided (AUCPR) to see how it works:
from sklearn.metrics import precision_recall_fscore_support def f_beta_score(y_true, y_pred, sample_weight=None, beta=1.0): """ Custom scoring function using Fbeta score. Must return a float quantifying the estimator prediction quality.  y_true is a numpy ndarray or pandas Series with true labels  y_pred is a numpy ndarray with predicted probabilities or class predictions  sample_weight is a numpy ndarray with shape (nb_records,) representing sample weights  beta is the beta parameter for Fbeta score """ # Convert probabilities to class predictions (binary classification) y_pred_class = (y_pred[:, 1] > 0.5).astype(int) # Calculate precision, recall, and Fbeta score precision, recall, f_beta, _ = precision_recall_fscore_support(y_true, y_pred_class, beta=beta, average='binary', sample_weight=sample_weight) return f_beta
This is leading to errros for all models: "Failed to train : <class 'ValueError'> : Custom scoring function failed: too many indices for array "
I would like to get this to work and I actually would like to get the below to work in my existing problem. How can I adjust this to work?
from sklearn.metrics import auc, precision_recall_curve def score(y_valid, y_pred): """ Custom scoring function. Must return a float quantifying the estimator prediction quality.  y_valid is a pandas Series  y_pred is a numpy ndarray with shape:  (nb_records,) for regression problems and classification problems where 'needs probas' (see below) is false (for classification, the values are the numeric class indexes)  (nb_records, nb_classes) for classification problems where 'needs probas' is true  [optional] X_valid is a dataframe with shape (nb_records, nb_input_features)  [optional] sample_weight is a numpy ndarray with shape (nb_records,) NB: this option requires a variable set as "Sample weights" """ # Data to plot precisionrecall curve precision, recall, thresholds = precision_recall_curve(y_valid, y_pred[:,1]) # Use AUC function to calculate the area under the precisionrecall curve auc_precision_recall = auc(recall, precision) return auc_precision_recall
Answers

TomWiley Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Registered Posts: 2 DataikerOptions
Hi!
For the AUCPR code sample, have you enabled the `Needs Probability` option in the custom metric options? It should look something like in the screenshot i've attached, and it is necessary for this specific example.
For the FBeta metric you've described, the following should do the trick:
from sklearn.metrics import precision_recall_fscore_support def score(y_true, y_pred, sample_weight=None): """ Custom scoring function using Fbeta score. Must return a float quantifying the estimator prediction quality.  y_true is a numpy ndarray or pandas Series with true labels  y_pred is a numpy ndarray with predicted probabilities or class predictions  sample_weight is a numpy ndarray with shape (nb_records,) representing sample weights  beta is the beta parameter for Fbeta score """ # Convert probabilities to class predictions (binary classification) beta=1.0 y_pred_class = (y_pred[:, 1] > 0.5).astype(int) # Calculate precision, recall, and Fbeta score precision, recall, f_beta, _ = precision_recall_fscore_support(y_true, y_pred_class, beta=beta, average='binary', sample_weight=sample_weight) return f_beta
What i've done here is modify your function just so that its signature matches what we expect.
Again, this will require the `Needs Probability` option set to true.
Let me know if this helps!
Tom