I'm attempting to make the best use of the "Custom Code" option for hyperparameter optimization and have a few questions. For reference, here are the comments on how to write the custom function:
# - y_pred is a numpy ndarray with shape: # - (nb_records,) for regression problems and classification problems # where 'needs probas' (see below) is false # (for classification, the values are the numeric class indexes) # - (nb_records, nb_classes) for classification problems where 'needs probas' is true
With "needs_probas" set to true, I run into some problems.
if len(np.shape(y_pred)) == 2: # scoring the model ds['probas'] = y_pred[:,1] else: # training the model ds['probas'] = y_pred[0:]
you stumbled indeed on a not-so-nice behavior of the custom scoring handling, which passes the full output of predict_probas when doing the final scoring, and only the second column (the positive case) when doing hyperparameter search and k-fold. Your solution is essentially the best one can come up with.
You needn't worry about the threshold, as it is computed after scoring with your code: DSS will try different threshold values and call your scoring code with a single-column corresponding to the positive case.