Advanced Designer Learning Path is now live! Read More

Custom Code Metric

Level 3
Level 3
Custom Code Metric

I'm attempting to make the best use of the "Custom Code" option for hyperparameter optimization and have a few questions. For reference, here are the comments on how to write the custom function:

# - y_pred is a numpy ndarray with shape:
# - (nb_records,) for regression problems and classification problems
# where 'needs probas' (see below) is false
# (for classification, the values are the numeric class indexes)
# - (nb_records, nb_classes) for classification problems where 'needs probas' is true
  • No real issues when "needs_probas" is false. By appending y_pred as a column to X_valid, I'm able to see which rows were predicted as "True" for my binary classification problem (for a given threshold)

With "needs_probas" set to true, I run into some problems.

  • It appears that the shape of y_pred is different depending on whether the model is training or scoring. Here's the code I've implemented that seems to solve the problem (again for a binary classification problem). I'm wondering if this should be necessary or if I'm missing something?
 if len(np.shape(y_pred)) == 2:
        # scoring the model
        ds['probas'] = y_pred[:,1]
        # training the model
        ds['probas'] = y_pred[0:]
  • With "needs_probas" false, the scoring seems to be dependent on the threshold (a row's prediction will be "true" when the proba is above a threshold) and with "needs_probas" false, it appears that the threshold is not provided to the scoring function. Is this correct and expected or am I missing something? Maybe the "needs_threshold" property just isn't implemented in DSS for binary classifications? (


0 Kudos
2 Replies


you stumbled indeed on a not-so-nice behavior of the custom scoring handling, which passes the full output of predict_probas when doing the final scoring, and only the second column (the positive case) when doing hyperparameter search and k-fold. Your solution is essentially the best one can come up with.

You needn't worry about the threshold, as it is computed after scoring with your code: DSS will try different threshold values and call your scoring code with a single-column corresponding to the positive case.

0 Kudos

Hi @rmoore ,

This has been fixed in release 8.0.2 : 

More precisely, the custom metric function can now correctly assume a y_pred shape of (N, 2) in the case of binary classification with needs_proba == True, when performing a hyperparameters search


A banner prompting to get Dataiku DSS