How can I add pvalues estimation to my logistic regressions
Options
SimonDeschamps
Registered Posts: 2 ✭✭✭✭
I am currently leading a statistical analysis on absenteism data. In this study, I am studying the influence of multiple factors on employees' presence at work. But anytime i use the logistic regression i can't get pvalues for the factors' coefficents (except when I use a PCA to reduce the dimension but in that case I can't interpret the results, which does not serve my case either)
Does anyone know how to recover that on Dataiku?
Thanks
SD
Does anyone know how to recover that on Dataiku?
Thanks
SD
Tagged:
Best Answer

Hi,
DSS only shows pvalues when there are less than 1000 coefficients (after preprocessing  so each categorical value becomes a coefficient). Even if you have less than 1000 coefficients, computing pvalues is not always possible due to numerical issues.
Beware that logistic regression in DSS is always regularized, and pvalues are not strictly defined for regularized regressions
Answers

Thank your for that (really) quick answer. However I only have 14 columns, with 52 categorical values in total so I am guessing that i'm facing those "numerical issues".
Could you explain what they are and how to get around?
Many thanks 
Hi,
If you want to use pvalues for rigorous statistical tests, I would advise using a logistic regression library which does not apply regularization. The scikitlearn version we use in the visual machine learning feature is regularized, which is better for classification performance, but less so for interpretability.
There is a Python implementation for unregularized logistic regression (a.k.a. logit) in the library statsmodel. Alternatively, you could use many R packages such as glm.
Cheers,
Alex