How can I add p-values estimation to my logistic regressions
I am currently leading a statistical analysis on absenteism data. In this study, I am studying the influence of multiple factors on employees' presence at work. But anytime i use the logistic regression i can't get p-values for the factors' coefficents (except when I use a PCA to reduce the dimension but in that case I can't interpret the results, which does not serve my case either)
DSS only shows p-values when there are less than 1000 coefficients (after preprocessing - so each categorical value becomes a coefficient). Even if you have less than 1000 coefficients, computing p-values is not always possible due to numerical issues.
Beware that logistic regression in DSS is always regularized, and p-values are not strictly defined for regularized regressions
Hi, If you want to use p-values for rigorous statistical tests, I would advise using a logistic regression library which does not apply regularization. The scikit-learn version we use in the visual machine learning feature is regularized, which is better for classification performance, but less so for interpretability. There is a Python implementation for unregularized logistic regression (a.k.a. logit) in the library statsmodel. Alternatively, you could use many R packages such as glm. Cheers, Alex