The India User Group is live! Be a part of our first Indian user event: JOIN THE EVENT

Logistic Regression

cwentz
Level 3
Level 3
Logistic Regression

Hello all, 

I posted a question in the community and Maureen responded, but then left me with more questions as I was confused, but she has not been able to respond again and I was hoping someone here could elaborate for me. 

Here is my question: 

Thank you @MaureenP. I appreciate your explanation. I have not heard of the Log of the Odds ratio before. I presume that this number is similar, in that a variables log of the Odds Ration if positive, is the % of the increase in the likelihood of 1 or True and a negative is the % decrease in the likelihood of 1 or True.

In the attached example, does it read that there is a 60% increased likelihood in retention for that variable and a 30% decreased likelihood in retention for the next variable.

Here is the link to the my reply to her https://community.dataiku.com/t5/Using-Dataiku-DSS/Logistic-regression-output/m-p/12756#M5658

 

Thank you so much for any help. I am still learning the differences between Dataiku and SPSS/Excel.

Screen Shot 2021-01-06 at 3.35.50 PM.png

0 Kudos
3 Replies
MehdiH
Dataiker
Dataiker

Hi @cwentz ,

Thank you for reaching out.

The screenshot that you sent looks like the regression coefficients tab.

Logisitic regression coefficients are a bit tricky to interpret. I'll try to give a simple explanation. In order to understand the log of the odds, let us first define the relation between the probability and the odds:

The odds (in favor) of an event or a proposition is the ratio of the probability that the event will happen to the probability that the event will not happen


For example, the odds that a day in the week is a weekend day are 2:5, where 2 is the number of weekend days and 5 is the others. This can be translated in terms of probability to 2 / (2 + 5) = 2 / 7. In general if the odds ratio is S:F and the probability is p, we have: p = S / (S + F) or S/F = p / (1-p)

The log of the odds in the previous example is log(2/5). In the general case, if we define the probability as p, the log of the odds is L = log(p/(1-p))

Now coming back to the interpretation of the logistic regression, the coefficients that you see are indeed the contribution of a variable to the prediction's log of the odds L = log(p/(1-p)) = sum(coefficients*variables). From this value, you can deduce the probability as p = exp(L)/(1 + exp(L))

0 Kudos
cwentz
Level 3
Level 3
Author

@MehdiH Thank you!

I do understand this a little more, however, without doing any math and just looking at the coefficients tab, could it be said that the number is a weight or amount, negative or positive, to the contribution to the event happening? I'm trying to understand how to word this for the audience it is intended for, the non-data person. 

0 Kudos
MehdiH
Dataiker
Dataiker

The "handwaving" explanation would be that the coefficients measure the contribution of the variable in the target probability.

The higher the coefficient the more it increases the probability. 

However, you need to do a bit of math (basically exp(coef)/(1 + exp(coef)) ) to get an approximate idea of the influence of the coefficient in the probability

0 Kudos
Private Area