How can I add p-values estimation to my logistic regressions

Highlighted
SimonDeschamps
Level 1
How can I add p-values estimation to my logistic regressions
Jump to solution
I am currently leading a statistical analysis on absenteism data. In this study, I am studying the influence of multiple factors on employees' presence at work. But anytime i use the logistic regression i can't get p-values for the factors' coefficents (except when I use a PCA to reduce the dimension but in that case I can't interpret the results, which does not serve my case either)

Does anyone know how to recover that on Dataiku?

Thanks

SD
0 Kudos
1 Solution

Accepted Solutions
Clément_Stenac Dataiker
Dataiker
Re: How can I add p-values estimation to my logistic regressions
Jump to solution
Hi,

DSS only shows p-values when there are less than 1000 coefficients (after preprocessing - so each categorical value becomes a coefficient). Even if you have less than 1000 coefficients, computing p-values is not always possible due to numerical issues.

Beware that logistic regression in DSS is always regularized, and p-values are not strictly defined for regularized regressions

View solution in original post

0 Kudos
3 Replies
Clément_Stenac Dataiker
Dataiker
Re: How can I add p-values estimation to my logistic regressions
Jump to solution
Hi,

DSS only shows p-values when there are less than 1000 coefficients (after preprocessing - so each categorical value becomes a coefficient). Even if you have less than 1000 coefficients, computing p-values is not always possible due to numerical issues.

Beware that logistic regression in DSS is always regularized, and p-values are not strictly defined for regularized regressions

View solution in original post

0 Kudos
SimonDeschamps
Level 1
Re: How can I add p-values estimation to my logistic regressions
Jump to solution
Thank your for that (really) quick answer. However I only have 14 columns, with 52 categorical values in total so I am guessing that i'm facing those "numerical issues".

Could you explain what they are and how to get around?

Many thanks
0 Kudos
Alex_Combessie Dataiker
Dataiker
Re: How can I add p-values estimation to my logistic regressions
Jump to solution
Hi,
If you want to use p-values for rigorous statistical tests, I would advise using a logistic regression library which does not apply regularization. The scikit-learn version we use in the visual machine learning feature is regularized, which is better for classification performance, but less so for interpretability.
There is a Python implementation for unregularized logistic regression (a.k.a. logit) in the library statsmodel. Alternatively, you could use many R packages such as glm.
Cheers,
Alex
0 Kudos
Labels (1)