Coefficient Table Use #1: Determining Which X-Variables Matter
You can determine which variables matter in logistic regression by looking at the -values of the coefficients. This is done in exactly the same way as for regular least squares regression as discussed in Part 2 of the review of regular least-squares regression.
The coefficient table output for the Kid Creative logistic regression is shown below: The -values are in the fourth numerical column which is labeled Pr(>|z|). -variables with -values that are less than 5% would generally be considered to be significant meaning that there is statistical evidence that they affect the probability that the -variable is 1 (i.e., that the customer buys the Kid Creative magazine). More generally, for a given significance level , a variable is significant at the level of significance if the -value is less than .
If we examine the -values in the logistic regression output above, we see that the following variables are significant at the 5% level of significance:
- Income: The -value is 0.0000. This means that there is extremely strong statistical evidence that income is related to the probability that a customer buys (i.e., that ).
- IsFemal: The -value is 0.0004. Thus there is extremely strong statistical evidence that you are more likely to buy if you are female.
- Minors: The -value is 0.0145. This means that there is strong statistical evidence that a customer is more likely to buy is there are minors under age 18 in the household.
- White: The -value is 0.0006. There is extremely strong statistical evidence that a white customer is more likely to buy than a customer who is not white.
- PrevChildMag: The -value is 0.0287. Thus there is strong statistical evidence that a customer who has previously purchased a magazine oriented towards children is more likely to purchase the Kid Creative magazine.
If we relax the significance level a bit (that is, consider -values greater than 5%), we see the following additional variables may also matter:
- ResidenceLength: The -value is 0.0738. This -variable is significant at the 10% level of significant which means that there is some statistical evidence that the longer a customer has lived in their home, the more likely they are to buy.
- Own: The -value is 0.0590. This is basically 6%, which is just higher than the 5% significance level that is often used as a cut off. Thus, there is statistical evidence (approaching strong statistical evidence) that a customer who owns their own home is more likely to buy.
- English: The -value is 0.0687 or about 7%. Thus, there is some evidence that speaking English as the first language in the household means that a customer is more likely to buy. (This is not surprising as the Kid Creative Magazine is in English.)
There is no statistical evidence that the following variables matter.
- IsMarried: The -value is 0.3343 or about 33%. There is no statistical evidence that being married matters.
- HasCollege: The -value is 0.5290 or about 53%. There is no statistical evidence that having some college education matters.
- IsProfessional: The -value is 0.6280 or about 63%. There is no statistical evidence that being a professional matters.
- IsRetired: The -value is 0.2140 or about 21%. There is no statistical evidence that being retired affects the probability that a customer buys.
- Unemployed: The -value is 0.8330 or 83%. There is no statistical evidence that being unemployed affects the probability that a customer buys.
- DualIncome: The -value is 0.3863 or about 39%. There is no statistical evidence that the household having dual income (two or more adults with income) matters.
- House: The -value is 0.1362 or about 14%. There is no credible statistical evidence that living in a house (as versus an apartment) matters.
- PrevParentMag: The -value is 0.4439 or about 44%. There is no statistical evidence that having previously purchased a magazine on parenting from the company matters.
Before closing this article, I want to remind you of couple of things. First, to say that an -variable does not matter means that the corresponding regression coefficient (the corresponding
The next part of this series (Part 3) discusses assessing the impact of each of the variables.