Random forest classification

nuvitu
nuvitu Registered Posts: 8 ✭✭✭✭
Hello all,

I use Algorithm : Random forest classification in Dataiku.

But in confusion matrix, I still see the threshold cut-off, I think it is only for regression? Anyone can help me to explain ? and how to choose Random forest regression (output is probability)

Answers

  • Clément_Stenac
    Clément_Stenac Dataiker, Dataiku DSS Core Designer, Registered Posts: 753 Dataiker
    Hi

    All Classification algorithms in DSS are meant to output classes (either binary: 0/1, or multi-class). The vast majority of classification algorithms don't directly predict classes but probabilities, and then apply a threshold on the probability.

    Random Forest Classification is one of these, so it does predict a probability and then applies a threshold to it. If you deploy a Random Forest Classification model in DSS, it will output both the probability and the thresholded predicted class. Thus, if you are only interested in the probability, you don't need to bother about the threshold, and just use the predicted probabilities columns in the result.

    In DSS, Random Forest Regression only applies to continuous scoring (ie predict a numerical variable like price, instead of a discrete variable like color).
  • Tapang
    Tapang Registered Posts: 1 ✭✭✭✭
    I have deployed a model, but on the output Predicted Dataset, I do not see the column for probabilities. Instead, I see the following added columns:
    prediction
    Decimal
    error
    error_decile
    abs_error_decile

    How do I interpret my result based on the prediction column?
Setup Info
    Tags
      Help me…