I'm attempting to make the best use of the "Custom Code" option for hyperparameter optimization and have a few questions. For reference, here are the comments on how to write the custom function:
# - y_pred is a numpy ndarray with shape: # - (nb_records,) for regression problems and classification problems # where 'needs probas' (see below) is false # (for classification, the values are the numeric class indexes) # - (nb_records, nb_classes) for classification problems where 'needs probas' is true
With "needs_probas" set to true, I run into some problems.
if len(np.shape(y_pred)) == 2: # scoring the model ds['probas'] = y_pred[:,1] else: # training the model ds['probas'] = y_pred[0:]
you stumbled indeed on a not-so-nice behavior of the custom scoring handling, which passes the full output of predict_probas when doing the final scoring, and only the second column (the positive case) when doing hyperparameter search and k-fold. Your solution is essentially the best one can come up with.
You needn't worry about the threshold, as it is computed after scoring with your code: DSS will try different threshold values and call your scoring code with a single-column corresponding to the positive case.
Hi @rmoore ,
This has been fixed in release 8.0.2 :
More precisely, the custom metric function can now correctly assume a
y_pred shape of
(N, 2) in the case of binary classification with
needs_proba == True, when performing a hyperparameters search