How can I add p-values estimation to my logistic regressions
SimonDeschamps
Registered Posts: 2 ✭✭✭✭
I am currently leading a statistical analysis on absenteism data. In this study, I am studying the influence of multiple factors on employees' presence at work. But anytime i use the logistic regression i can't get p-values for the factors' coefficents (except when I use a PCA to reduce the dimension but in that case I can't interpret the results, which does not serve my case either)
Does anyone know how to recover that on Dataiku?
Thanks
SD
Does anyone know how to recover that on Dataiku?
Thanks
SD
Tagged:
Best Answer
-
Hi,
DSS only shows p-values when there are less than 1000 coefficients (after preprocessing - so each categorical value becomes a coefficient). Even if you have less than 1000 coefficients, computing p-values is not always possible due to numerical issues.
Beware that logistic regression in DSS is always regularized, and p-values are not strictly defined for regularized regressions
Answers
-
Thank your for that (really) quick answer. However I only have 14 columns, with 52 categorical values in total so I am guessing that i'm facing those "numerical issues".
Could you explain what they are and how to get around?
Many thanks -
Hi,
If you want to use p-values for rigorous statistical tests, I would advise using a logistic regression library which does not apply regularization. The scikit-learn version we use in the visual machine learning feature is regularized, which is better for classification performance, but less so for interpretability.
There is a Python implementation for unregularized logistic regression (a.k.a. logit) in the library statsmodel. Alternatively, you could use many R packages such as glm.
Cheers,
Alex