Least-Squares Background Part 5 — Assessing Uncertainty

In this Article, I discuss assessing the uncertainty of the regression coefficient estimates and will briefly touch on assessing the uncertainty of predictions. This article is the last part of a five part series reviewing regular linear-least squares regression. As I have explained previously on this web site, I will use references to regular least-squares regression to explain the output of logistic regression analysis. In Part 1 of this series, I discussed the three main sections of the least-squares regression output and provided an example using the Kid Creative data. In parts 2 through 4, I discuss important uses of the regression coefficient table, the most important part of any linear least-squares regression output. Specifically, Part 2 discusses determining what variables matter, Part 3 focuses on assessing the impact of each of the X-variables, and Part 4 discusses making predictions.

Coefficient Table Use #4: Assessing Uncertainty

The coefficient table in the output of any regular linear least-squares regression includes standard errors of the estimated regression coefficients. Usually, these standard errors are in the second column of the table. The standard errors are estimates of the standard deviations of estimated regression coefficients.

The standard errors can be used to construct confidence intervals for the regression coefficients. It is not my intention to repeat a course on linear least-squares regression here, so I am not going to either derive or be precise about these confidence intervals. However, roughly speaking, going plus or minus two times the standard error from the regression coefficient gives approximately a 95% confidence interval for the coefficient.

The confidence interval is interpreted as showing the uncertainty about the precise value of the regression coefficient. Roughly speaking, we interpret the confidence interval as indicating a possible range for the true value of the regression coefficient. To be slightly more precise, the confidence interval shows a range of possible values of the regression coefficient that are not inconsistent with the data we observed.

Once again, the regression output from the Kid Creative data provides an example. Below I show an excerpt from the complete regression output that gives just the regression coefficients together with their standard errors (for the complete regression output click here):

If we focus our attention on the variable “Unemployed” (which was determined in Part 2 to matter), we see that the estimated expected drop in household income due to being unemployed is $10,663.04. The table also shows that the standard error of this regression coefficient is $4,110.046. This means that an approximate 95% confidence for the Unemployment coefficient is

    \[-\$10,663.04\pm 2\times \$4,110.0458 = -\$10,663.04\pm \$8,220.92\]

Thus, according to the approximate 95% confidence interval, the true regression coefficient for Unemployment could be anywhere from -$18,883.96 to -$2,442,12. So being unemployed results in anywhere from a $2,442,12 to a $18,883.96 reduction in household income.

A similar procedure can be repeated with each of the other regression coefficients to obtain intervals that show the uncertainty in the estimates.

Predictions are made using the least-square regression line (as discussed in Part 4):

    \[\hat{Y} = \hat{\alpha} + \hat{\beta}_1 X_1 + \hat{\beta}_2 X_2 + \cdots + \hat{\beta}_p X_p .\]

Since the equation for this line depends on the estimated regression coefficients \hat\beta_1,\hat\beta_2,\dots,\hat\beta_p, there is a close relationship between the standard error of a prediction and the standard errors of the regression coefficients given in the regression coefficient table. Things are more complicated than this statement implies, however, because, in addition to knowing the standard errors of the regression coefficient estimates, you also need to know their correlations in order to compute the standard error of a prediction. Most regression programs have an option that will compute the standard errors of predictions, so the user generally does not have to be concerned about the details of the computations.

In summary, then, the regression coefficient table includes standard errors of the estimated regression coefficients (i.e., standard deviations of the \hat\beta‘s. These standard errors measure the uncertainty in the regression coefficient estimates and can be used to construct confidence intervals for them. Similarly, there is uncertainty in any predictions made using the estimated regression line. The standard error of such predictions can also be calculated and, in most regression programs, there will be an option that causes these prediction standard errors or prediction confidence intervals to be output.

This concludes the last part of this review of regular linear least-squares regression. Drawing on your knowledge of regular regression, my next article will begin to discuss the output from a logistic regression.

Questions or Comments?

Any questions or comments? Please feel free to comment below. I am always wanting to improve this material.

I have had to implement a very simple “captcha” field because of spam comments, so be a bit careful about that. Enter your answer as a number (not a word). Also, save your comment before you submit it in case you make a mistake. You can use cntrl-A and then cntrl-C to save a copy on the clipboard. Particularly if your comment is long, I would hate for you to lose it.

This entry was posted in Background. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

ENTER answer below as a NUMBER.
CAREFUL! Error will delete your comment. Save it first (cntrl-A then cntrl-C saves to clipboard). *
Time limit is exhausted. Please reload the CAPTCHA.