**Understanding Logistic Regression Coefficient Ouput**(5-part series)

- Part 1: How the logistic regression coefficient table compares to the corresponding table from least-squares regression.
- Part 2: Coefficient table use #1 — determining what variables in the logistic regression matter.
- Part 3: Coefficient table use #2 — assessing the impact of each of the -variables on the dependent variable (actually,.
- Part 4: Coefficient table use #3 — predicting the probability that the dependent variable is 1.
- Part 5: Coefficient table use #4 — assessing the uncertainty in the regression coefficients.

## Coefficient Table Use #1: Determining Which X-Variables Matter

You can determine which variables matter in logistic regression by looking at the -values of the coefficients. This is done in exactly the same way as for regular least squares regression as discussed in Part 2 of the review of regular least-squares regression.

The coefficient table output for the Kid Creative logistic regression is shown below: The -values are in the fourth numerical column which is labeled Pr(>|z|). -variables with -values that are less than 5% would generally be considered to be significant meaning that there is statistical evidence that they affect the probability that the -variable is 1 (i.e., that the customer buys the Kid Creative magazine). More generally, for a given significance level , a variable is significant at the level of significance if the -value is less than .

If we examine the -values in the logistic regression output above, we see that the following variables are significant at the 5% level of significance:

- Income: The -value is 0.0000. This means that there is extremely strong statistical evidence that income is related to the probability that a customer buys (i.e., that ).
- IsFemal: The -value is 0.0004. Thus there is extremely strong statistical evidence that you are more likely to buy if you are female.
- Minors: The -value is 0.0145. This means that there is strong statistical evidence that a customer is more likely to buy is there are minors under age 18 in the household.
- White: The -value is 0.0006. There is extremely strong statistical evidence that a white customer is more likely to buy than a customer who is not white.
- PrevChildMag: The -value is 0.0287. Thus there is strong statistical evidence that a customer who has previously purchased a magazine oriented towards children is more likely to purchase the Kid Creative magazine.

If we relax the significance level a bit (that is, consider -values greater than 5%), we see the following additional variables may also matter:

- ResidenceLength: The -value is 0.0738. This -variable is significant at the 10% level of significant which means that there is some statistical evidence that the longer a customer has lived in their home, the more likely they are to buy.
- Own: The -value is 0.0590. This is basically 6%, which is just higher than the 5% significance level that is often used as a cut off. Thus, there is statistical evidence (approaching strong statistical evidence) that a customer who owns their own home is more likely to buy.
- English: The -value is 0.0687 or about 7%. Thus, there is some evidence that speaking English as the first language in the household means that a customer is more likely to buy. (This is not surprising as the Kid Creative Magazine is in English.)

There is no statistical evidence that the following variables matter.

- IsMarried: The -value is 0.3343 or about 33%. There is no statistical evidence that being married matters.
- HasCollege: The -value is 0.5290 or about 53%. There is no statistical evidence that having some college education matters.
- IsProfessional: The -value is 0.6280 or about 63%. There is no statistical evidence that being a professional matters.
- IsRetired: The -value is 0.2140 or about 21%. There is no statistical evidence that being retired affects the probability that a customer buys.
- Unemployed: The -value is 0.8330 or 83%. There is no statistical evidence that being unemployed affects the probability that a customer buys.
- DualIncome: The -value is 0.3863 or about 39%. There is no statistical evidence that the household having dual income (two or more adults with income) matters.
- House: The -value is 0.1362 or about 14%. There is no credible statistical evidence that living in a house (as versus an apartment) matters.
- PrevParentMag: The -value is 0.4439 or about 44%. There is no statistical evidence that having previously purchased a magazine on parenting from the company matters.

Before closing this article, I want to remind you of couple of things. First, to say that an -variable does not matter means that the corresponding regression coefficient (the corresponding

The next part of this series (Part 3) discusses assessing the impact of each of the variables.

Hello

Thanks for this site. I have a question. In your example, Income has a very low beta_i, i.e. 0.0002 and an odds ratio close to 1. I would view that as suggesting income has practically no influence on the dependent variable.

But if the z value are p value are analyzed – the p-value is 0.0000 which means that the null hypothesis that coefficient for income = 0 is strongly rejected. In other words, there is very high confidence that coefficient for income not = 0, and in fact 0.0002 is a reliable estimate of the coefficient.

But, should we not use a combination of the actual coefficient value (being “very” different from 0) or odds ratio (being “very” different from 1) and the p-value to decide which variables are explanatory?

For example, I see that for income, there is high confidence that the coefficient =0.0002 and odds ratio = 1.0002. I see that as implying income is not an explanatory variable….but the p-value suggests otherwise.

Thanks for clarifying this,

Hari

hai im Deana.. i just want to ask. what if we only had 1 X-variable? im doing a rainfall data series. thus, i want to know how significantly important no of rain-days towards amount of rainfall