The least-squares regression coefficients and the regression coefficient table are used primarily for four key purposes:
- To determine which -variables matter.
- To determine the effects of each of the -variables.
- To compute the regression equation in order to make predictions.
- To assess the uncertainty about the regression coefficients (and predictions).
I will now discuss each of these uses in the context of regular linear least-squares regression using the Kid Creative Household income example.
Coefficient Table Use #1: Which Variables Matter? Interpreting the -Values
To determine what -variables matter, we generally look at each coefficient’s -value. We could look at the -statistics, but it is simpler to look at the -values. The smaller a coefficient’s -value, the more evidence there is that the corresponding -variable matters. To say that an -variable matters is to say that there is evidence that the variable has an effect on the .
Specifically, when the -value for a coefficient is less than the significance level (usually or 5%), we take this to mean that there is evidence that the -variable matters. More precisely still, when the -value is less than the significance level , the null hypothesis that the regression coefficient is zero is rejected (meaning that there is statistical evidence that the regression coefficient is not ). Note that to say that a variable does not matter is the same thing as saying the the regression coefficient for that variable is zero (i.e., ). Thus, when I say that an -variable matters, I mean that there is a real association between that -variable and the -variable Household Income.
The table below shows just the regression coefficients and their corresponding -values from the KidCreative least-squares regression output from the regression of Household Income on the -variables discussed in Part 1. (Click here to see the entire regression output).
Looking at the table, we see that there is evidence that 9 of the variables matter:
- IsFemale — strong evidence: -value = 0.6%
- IsMarried — very strong evidence: -value = 0.0%
- HasCollege — very strong evidence: -value = 0.0%
- IsProfessional — very strong evidence: -value = 0.0%
- Unemployed — strong evidence: -value = 1.0%
- ResLength — strong eidence: -value = 0.8%
- Own — very strong evidence: -value = 0.0%
- White — very strong evidence: -value = 0.0%
- PrevChild — strong evidence: -value = 1.5% = 1 if previously purchased a children’s magazine)
There is no evidence that the following variables matter:
- Dual — no evidence: -value = 16.9%
- Children — no evidence: -value = 65.2%
- PrevParent — No evidence: -value = 13.5%
There is some suggestion that the following variables might matter:
- House — weak evidence: -value = 8.9%
- English — very weak evidence: -value = 11.0%
Determining which variables matter is one of of the most important uses of the regression coefficients and their associated statistics. It is often a central part of a model building process; that is, a process to determine what -variables should be in the regression equation and what variables should be omitted.
In the next part of this background series (Part 3), I discuss another very important use of regression coefficients, assessing the impact of each of the -variables.
Questions or Comments?
Any questions or comments? Please feel free to comment below. I am always wanting to improve this material.
I have had to implement a very simple “captcha” field because of spam comments, so be a bit careful about that. Enter your answer as a number (not a word). Also, save your comment before you submit it in case you make a mistake. You can use cntrl-A and then cntrl-C to save a copy on the clipboard. Particularly if your comment is long, I would hate for you to lose it.