This website uses cookies. By browsing this website, you consent to the use of cookies. Learn more.

Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

Highlighted
I am currently leading a statistical analysis on absenteism data. In this study, I am studying the influence of multiple factors on employees' presence at work. But anytime i use the logistic regression i can't get p-values for the factors' coefficents (except when I use a PCA to reduce the dimension but in that case I can't interpret the results, which does not serve my case either)

Does anyone know how to recover that on Dataiku?

Thanks

SD

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Does anyone know how to recover that on Dataiku?

Thanks

SD

1 Solution

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

DSS only shows p-values when there are less than 1000 coefficients (after preprocessing - so each categorical value becomes a coefficient). Even if you have less than 1000 coefficients, computing p-values is not always possible due to numerical issues.

Beware that logistic regression in DSS is always regularized, and p-values are not strictly defined for regularized regressions

3 Replies

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

DSS only shows p-values when there are less than 1000 coefficients (after preprocessing - so each categorical value becomes a coefficient). Even if you have less than 1000 coefficients, computing p-values is not always possible due to numerical issues.

Beware that logistic regression in DSS is always regularized, and p-values are not strictly defined for regularized regressions

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Re: How can I add p-values estimation to my logistic regressions

Could you explain what they are and how to get around?

Many thanks

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Re: How can I add p-values estimation to my logistic regressions

If you want to use p-values for rigorous statistical tests, I would advise using a logistic regression library which does not apply regularization. The scikit-learn version we use in the visual machine learning feature is regularized, which is better for classification performance, but less so for interpretability.

There is a Python implementation for unregularized logistic regression (a.k.a. logit) in the library statsmodel. Alternatively, you could use many R packages such as glm.

Cheers,

Alex