Background Part 2 — Uses of the Least-Squares Regression Coefficients and Determining Which Variables Matter

In this article, I am going to focus on the most important section of the regression output, namely the table of regression coefficients and the statistics that accompanies them. This article is part 2 of a 5 part series reviewing regular least-squares regression. In Part 1, I briefly discussed the three sections of the least-squares regression output (the coefficient table, the goodness of fit, and the ANOVA) using data from the Kid Creative example.

The least-squares regression coefficients and the regression coefficient table are used primarily for four key purposes:

  1. To determine which X-variables matter.
  2. To determine the effects of each of the X-variables.
  3. To compute the regression equation in order to make predictions.
  4. To assess the uncertainty about the regression coefficients (and predictions).

I will now discuss each of these uses in the context of regular linear least-squares regression using the Kid Creative Household income example.

Coefficient Table Use #1: Which Variables Matter? Interpreting the p-Values

To determine what X-variables matter, we generally look at each coefficient’s p-value. We could look at the t-statistics, but it is simpler to look at the p-values. The smaller a coefficient’s p-value, the more evidence there is that the corresponding X-variable matters. To say that an X-variable matters is to say that there is evidence that the variable has an effect on the Y.

Specifically, when the p-value for a coefficient is less than the significance level \alpha (usually \alpha=0.05 or 5%), we take this to mean that there is evidence that the X-variable matters. More precisely still, when the p-value is less than the significance level \alpha, the null hypothesis that the regression coefficient is zero is rejected (meaning that there is statistical evidence that the regression coefficient is not 0). Note that to say that a variable does not matter is the same thing as saying the the regression coefficient for that variable is zero (i.e., \beta=0). Thus, when I say that an X-variable matters, I mean that there is a real association between that X-variable and the Y-variable Household Income.

The table below shows just the regression coefficients and their corresponding p-values from the KidCreative least-squares regression output from the regression of Household Income on the X-variables discussed in Part 1. (Click here to see the entire regression output).

Looking at the table, we see that there is evidence that 9 of the variables matter:

  • IsFemale — strong evidence: p-value = 0.6%
  • IsMarried — very strong evidence: p-value = 0.0%
  • HasCollege — very strong evidence: p-value = 0.0%
  • IsProfessional — very strong evidence: p-value = 0.0%
  • Unemployed — strong evidence: p-value = 1.0%
  • ResLength — strong eidence: p-value = 0.8%
  • Own — very strong evidence: p-value = 0.0%
  • White — very strong evidence: p-value = 0.0%
  • PrevChild — strong evidence: p-value = 1.5% = 1 if previously purchased a children’s magazine)

There is no evidence that the following variables matter:

  • Dual — no evidence: p-value = 16.9%
  • Children — no evidence: p-value = 65.2%
  • PrevParent — No evidence: p-value = 13.5%

There is some suggestion that the following variables might matter:

  • House — weak evidence: p-value = 8.9%
  • English — very weak evidence: p-value = 11.0%

Determining which variables matter is one of of the most important uses of the regression coefficients and their associated statistics. It is often a central part of a model building process; that is, a process to determine what X-variables should be in the regression equation and what variables should be omitted.

In the next part of this background series (Part 3), I discuss another very important use of regression coefficients, assessing the impact of each of the X-variables.

Questions or Comments?

Any questions or comments? Please feel free to comment below. I am always wanting to improve this material.

I have had to implement a very simple “captcha” field because of spam comments, so be a bit careful about that. Enter your answer as a number (not a word). Also, save your comment before you submit it in case you make a mistake. You can use cntrl-A and then cntrl-C to save a copy on the clipboard. Particularly if your comment is long, I would hate for you to lose it.

This entry was posted in Background. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *


ENTER answer below as a NUMBER.
CAREFUL! Error will delete your comment. Save it first (cntrl-A then cntrl-C saves to clipboard). *
Time limit is exhausted. Please reload the CAPTCHA.