This article discusses using the logistic regression coefficient table output to assess uncertainty. It is the last part of a 5-part series focused on understanding and interpreting the logistic regression coefficient table. Click here to see a brief summary of the focus of each of the 5 parts.
- Part 1: How the logistic regression coefficient table compares to the corresponding table from least-squares regression.
- Part 2: Use #1 — determining what variables in the logistic regression matter.
- Part 3: Use #2 — assessing the impact of each of the -variables on the dependent variable (actually,.
- Part 4: Use #3 — predicting the probability that the dependent variable is 1.
- Part 5: This part. Use #4 — assessing the uncertainty in the regression coefficients.
Coefficient Table Use #4: Assessing Uncertainty
In logistic regression, assessing the uncertainty in the estimated coefficients is virtually the same as for least-squares regression (click here for a review of assessing uncertainty in least-squares regression). In both logistic regression and least-squares regression, the regression coefficient table will include a column for the regression coefficients followed by a column of standard errors, then by a column of test statistics, and finally a column of -values. The table below shows the coefficient table output for the Kid Creative regression (recall that what we are modeling is the probability of buying the Kid Creative magazine).
Note that the test statistics are labeled “z value” and the -values are labeled “P(>|t|)” in the table above.
The standard errors can be used to construct confidence intervals for the regression coefficients. It is not my intention to repeat a course on basic statistics here, so I am not going to either derive or be precise about these confidence intervals. However, roughly speaking, going plus or minus 2 times the standard error from the regression coefficient gives approximately a 95% confidence interval for the coefficient.
For example, for Residence Length, the regression coefficient is 0.024680. The next column gives the standard error of the regression coefficient which is 0.013800. Thus an approximate 95% confidence interval for the Residence Length regression coefficient is:
This means that the regression coefficient for Residence Length could be anywhere from to (with 95% confidence).
As I explained in Part 3 of this series, we often use the odds-ratio, which is the exponential of the regression coefficient (i.e., ), to help to interpret the meaning of the regression coefficient. The odds-ratio for the Residence Length coefficient, as shown in the coefficient table, is 1.0250. This means that there is a 2.5% increase in the odds of buying the Kid Creative magazine associated with each additional year of residence.
We can also compute the odds-ratios corresponding to the ends of the confidence interval. These odds-ratios will give us an equivalent confidence intervals for the odds. So continuing the example using Residence Length, the odds ratios corresponding to the ends of the confidence interval are and .
Thus, the interval is an approximate 95% confidence interval for the odds ratio. This means that there could be anywhere from a 0.292% decrease to a 5.367% increase in the odds of buying the Kid Creative magazine associated with each additional year of residence.
I have now discussed the main way that the logistic regression coefficient table output is used to assess uncertainty. You may recall that when I discussed the same issue (assessing uncertainty) in my review of least-squares regression, I briefly touched on computing the uncertainty of predictions (prediction intervals). If you want to review that discussion you can see it by clicking here.
In logistic regression, it does not make sense to create a prediction interval for a new observation in the same way that it does in regular least-squares regression. The reason is simply that we know that outcome of the dependent -variable is either 0 or 1 (and not any other values) for any set of values of the -variables. This, of course, is because the -variable is binary in logistic regression. What we don’t know is the probability. We could think about computing a confidence interval for this probability (which could be done), but not really for the outcome of (or at least not in any kind of a way that is parallels what is done in least-squares regression).
The point of this last paragraph is simply this. In logistic regression analysis, we don’t have to worry about prediction intervals. People do not try to compute anything when using logistic regression that is an analog to or resembles prediction intervals in regular least-square regression. Further, the logistic regression software generally does not have any way to output prediction-type intervals.
This article concludes my discussion of interpreting and using the coefficient table output that is produced by logistic regression software. The coefficient table is the most important and useful part of the logistic regression output. But we do have another huge topic to address, one that is almost as big as the coefficient table. That is assessing how well the logistic regression model fits the data. This issue is called “goodness-of-fit.” It is probably the most important remaining topic that I will address as I continue to develop the major ideas of logistic regression.