Hyperparameter optimization and model evaluation failing

Options
GreyMerchant
GreyMerchant Registered Posts: 1
edited July 16 in Using Dataiku

Hi there,

I am trying to do a custom Hyperparameter optimization and model evaluation. I tried using one of the code samples provided (AUC-PR) to see how it works:

from sklearn.metrics import precision_recall_fscore_support

def f_beta_score(y_true, y_pred, sample_weight=None, beta=1.0):
    """
    Custom scoring function using F-beta score.
    Must return a float quantifying the estimator prediction quality.
    - y_true is a numpy ndarray or pandas Series with true labels
    - y_pred is a numpy ndarray with predicted probabilities or class predictions
    - sample_weight is a numpy ndarray with shape (nb_records,) representing sample weights
    - beta is the beta parameter for F-beta score
    """
    # Convert probabilities to class predictions (binary classification)
    y_pred_class = (y_pred[:, 1] > 0.5).astype(int)

    # Calculate precision, recall, and F-beta score
    precision, recall, f_beta, _ = precision_recall_fscore_support(y_true, y_pred_class, beta=beta, average='binary', sample_weight=sample_weight)

    return f_beta

This is leading to errros for all models: "Failed to train : <class 'ValueError'> : Custom scoring function failed: too many indices for array "

I would like to get this to work and I actually would like to get the below to work in my existing problem. How can I adjust this to work?

from sklearn.metrics import auc, precision_recall_curve

def score(y_valid, y_pred):
"""
Custom scoring function.
Must return a float quantifying the estimator prediction quality.
- y_valid is a pandas Series
- y_pred is a numpy ndarray with shape:
- (nb_records,) for regression problems and classification problems
where 'needs probas' (see below) is false
(for classification, the values are the numeric class indexes)
- (nb_records, nb_classes) for classification problems where
'needs probas' is true
- [optional] X_valid is a dataframe with shape (nb_records, nb_input_features)
- [optional] sample_weight is a numpy ndarray with shape (nb_records,)
NB: this option requires a variable set as "Sample weights"
"""
# Data to plot precision-recall curve
precision, recall, thresholds = precision_recall_curve(y_valid, y_pred[:,1])
# Use AUC function to calculate the area under the precision-recall curve
auc_precision_recall = auc(recall, precision)
return auc_precision_recall

Answers

  • TomWiley
    TomWiley Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Registered Posts: 2 Dataiker
    edited July 17
    Options

    Hi!

    For the AUC-PR code sample, have you enabled the `Needs Probability` option in the custom metric options? It should look something like in the screenshot i've attached, and it is necessary for this specific example.

    Screenshot 2024-01-25 at 14.09.04.png

    For the F-Beta metric you've described, the following should do the trick:

    from sklearn.metrics import precision_recall_fscore_support
    
    def score(y_true, y_pred, sample_weight=None):
        """
        Custom scoring function using F-beta score.
        Must return a float quantifying the estimator prediction quality.
        - y_true is a numpy ndarray or pandas Series with true labels
        - y_pred is a numpy ndarray with predicted probabilities or class predictions
        - sample_weight is a numpy ndarray with shape (nb_records,) representing sample weights
        - beta is the beta parameter for F-beta score
        """
        # Convert probabilities to class predictions (binary classification)
        beta=1.0
        y_pred_class = (y_pred[:, 1] > 0.5).astype(int)
    
        # Calculate precision, recall, and F-beta score
        precision, recall, f_beta, _ = precision_recall_fscore_support(y_true, y_pred_class, beta=beta, average='binary', sample_weight=sample_weight)
    
        return f_beta

    What i've done here is modify your function just so that its signature matches what we expect.

    Again, this will require the `Needs Probability` option set to true.

    Let me know if this helps!

    Tom

Setup Info
    Tags
      Help me…