Least-Squares Background Part 4 — Making Predictions

In this Article, I discuss another key use of the regression coefficients, namely to make predictions. This article is part 4 of a 5 part series briefly reviewing some aspects of regular linear least-squares regression. Some background with respect to least-squares regression will be used to motivate interpretation of the logistic regression output. Part 1 of this review series discusses the three main sections of the least-squares regression output and provides an example using the Kid Creative data. Part 2 discusses using the regression coefficient table to determine what variables matter. Part 3 discusses assessing the impact of each of the variables using the regression coefficients.

Coefficient Table Use #3: Making Predictions

When we use regular least-squares regression, we obtain a linear equation that is intended to predict the expected (average) value of the dependent Y variable given the values of the X-variables. In notation, we obtain the equation for the least-squares regression line:

    \[\hat{Y} = \hat{\alpha} + \hat{\beta}_1 X_1 + \hat{\beta}_2 X_2 + \cdots + \hat{\beta}_p X_p .\]

If we want to make a prediction of the value of Y for a given set of values for the X variables, we can just plug the X-values and the regression coefficients (the \beta‘s) into the regression equation.

Using the least-squares regression example based on the Kid Creative data discussed in Part 1, suppose I wanted to predict the Household Income for a person with the following characteristics:

  • Gender Male: IsFemale = 0
  • Married: IsMarried = 1
  • College Educated: HasCollege = 1
  • Not a Professional: IsProfessional = 0
  • Not Retired: IsRetired = 0
  • Employed: Unemployed = 0
  • Five years of Residency in Current City: ResLength = 5
  • Dual Income: Dual = 1
  • Has Children: Minors = 1
  • Rents Home: Own = 0
  • Lives in a house: House = 1
  • Race is white: White = 1
  • First language is English: English = 1
  • No previous purchases: PrevParent = 0 and PrevChild = 0

To predict the Household Income for this person, I simply plug these X-values into the least-squares regression equation. To do so, I need the regression coefficients. The table below shows the regression coefficients pulled from the coefficient output table for the KidCreative Household Income least-squares regression example (click here to see the entire regression output).

Using these regression coefficients and the particular X-values given above, the predicted household income is:

    \[\begin{split} \hat{Y} &= 17151.292 + (-3997.272)\times 0 + 8549.088\times 1 + 7380.698\times 1 \\ &+ 11432.073\times 0 + (-2436.926)\times 0 + (-10633.042)\times 0 + 147.493\times 5 \\ &+ 3183.951\times 1 + (-733.964)\times 1 + 12484.463\times 0 + 3049.848\times 1\\ &+ 7259.981\times 1 + (-4273.756)\times 1 + 6689.962\times 0 + 3839.232\times 0 \end{split}\]

I certainly do not want to type all of this into a calculator, so I am going to compute the prediction using Excel. Here is the section of the Excel worksheet that I used:

Thus, the expected household income predicted by the least-square regression for a person with the X-variable values listed above is about $42,300.

Prediction is one of the most important uses of the regression coefficient table. In the last part of this background series, I will discuss assessing the statistical uncertainty with respect to the regression coefficient and will briefly touch on the uncertainty with respect to predictions. Click here to proceed to Part 5.

Questions or Comments?

Any questions or comments? Please feel free to comment below. I am always wanting to improve this material.

I have had to implement a very simple “captcha” field because of spam comments, so be a bit careful about that. Enter your answer as a number (not a word). Also, save your comment before you submit it in case you make a mistake. You can use cntrl-A and then cntrl-C to save a copy on the clipboard. Particularly if your comment is long, I would hate for you to lose it.

This entry was posted in Background. Bookmark the permalink.

One Response to Least-Squares Background Part 4 — Making Predictions

  1. Martyn Cook says:

    In the example shown there is a negative effect from the coefficient relating to being a female (which is not activated since it’s multiplied by 0). However the positive effect of being male (as shown in previous examples) is not present as it has not been chosen as an X criteria. My question is, would you normally choose to have both mutually exclusive X criteria in the equation and if not how do you decide which to include?

Leave a Reply

Your email address will not be published. Required fields are marked *

ENTER answer below as a NUMBER.
CAREFUL! Error will delete your comment. Save it first (cntrl-A then cntrl-C saves to clipboard). *
Time limit is exhausted. Please reload the CAPTCHA.