Understanding Logistic Regression Output:
Part 2 — Which Variables Matter

This article discusses one of the most important uses of the coefficient table, determining which variables matter. It is the second part of a five-part series discussing the logistic regression output coefficient table and its uses. Click here to see a brief summary of the focus of each of the 5 parts.

Coefficient Table Use #1: Determining Which X-Variables Matter

You can determine which variables matter in logistic regression by looking at the p-values of the coefficients. This is done in exactly the same way as for regular least squares regression as discussed in Part 2 of the review of regular least-squares regression.

The coefficient table output for the Kid Creative logistic regression is shown below: The p-values are in the fourth numerical column which is labeled Pr(>|z|). X-variables with p-values that are less than 5% would generally be considered to be significant meaning that there is statistical evidence that they affect the probability that the Y-variable is 1 (i.e., that the customer buys the Kid Creative magazine). More generally, for a given significance level \alpha, a variable is significant at the \alpha level of significance if the p-value is less than \alpha.

If we examine the p-values in the logistic regression output above, we see that the following variables are significant at the 5% level of significance:

  • Income: The p-value is 0.0000. This means that there is extremely strong statistical evidence that income is related to the probability that a customer buys (i.e., that Y=1).
  • IsFemal: The p-value is 0.0004. Thus there is extremely strong statistical evidence that you are more likely to buy if you are female.
  • Minors: The p-value is 0.0145. This means that there is strong statistical evidence that a customer is more likely to buy is there are minors under age 18 in the household.
  • White: The p-value is 0.0006. There is extremely strong statistical evidence that a white customer is more likely to buy than a customer who is not white.
  • PrevChildMag: The p-value is 0.0287. Thus there is strong statistical evidence that a customer who has previously purchased a magazine oriented towards children is more likely to purchase the Kid Creative magazine.

If we relax the significance level a bit (that is, consider p-values greater than 5%), we see the following additional variables may also matter:

  • ResidenceLength: The p-value is 0.0738. This X-variable is significant at the 10% level of significant which means that there is some statistical evidence that the longer a customer has lived in their home, the more likely they are to buy.
  • Own: The p-value is 0.0590. This is basically 6%, which is just higher than the 5% significance level that is often used as a cut off. Thus, there is statistical evidence (approaching strong statistical evidence) that a customer who owns their own home is more likely to buy.
  • English: The p-value is 0.0687 or about 7%. Thus, there is some evidence that speaking English as the first language in the household means that a customer is more likely to buy. (This is not surprising as the Kid Creative Magazine is in English.)

There is no statistical evidence that the following variables matter.

  • IsMarried: The p-value is 0.3343 or about 33%. There is no statistical evidence that being married matters.
  • HasCollege: The p-value is 0.5290 or about 53%. There is no statistical evidence that having some college education matters.
  • IsProfessional: The p-value is 0.6280 or about 63%. There is no statistical evidence that being a professional matters.
  • IsRetired: The p-value is 0.2140 or about 21%. There is no statistical evidence that being retired affects the probability that a customer buys.
  • Unemployed: The p-value is 0.8330 or 83%. There is no statistical evidence that being unemployed affects the probability that a customer buys.
  • DualIncome: The p-value is 0.3863 or about 39%. There is no statistical evidence that the household having dual income (two or more adults with income) matters.
  • House: The p-value is 0.1362 or about 14%. There is no credible statistical evidence that living in a house (as versus an apartment) matters.
  • PrevParentMag: The p-value is 0.4439 or about 44%. There is no statistical evidence that having previously purchased a magazine on parenting from the company matters.

Before closing this article, I want to remind you of couple of things. First, to say that an X-variable does not matter means that the corresponding regression coefficient (the corresponding \beta) is 0. Thus, the above discussion about which variables matter is really a discussion about whether statistical hypothesis tests show that the corresponding regression coefficient is not 0 (\beta=0 is the null hypothesis). Second, regression coefficients assess the impact of the X-variable conditional on the other variables in the regression equation. Thus, when we say that there is no statistical evidence that whether or not the customer is employed is associated with the probability that they buy (in this regression), what this means is that there is no evidence that this variable matters above and beyond the other variables (which include variables such as income which may already capture some of the impact of employment status). If you were to run a logistic regression of purchase behavior only on employment status, you might get a very difference result.

The next part of this series (Part 3) discusses assessing the impact of each of the variables.

Questions or Comments?

Any questions or comments? Please feel free to comment below. I am always wanting to improve this material. I have had to implement a very simple “captcha” field because of spam comments, so be a bit careful about that. Enter your answer as a number (not a word). Also, save your comment before you submit it in case you make a mistake. You can use cntrl-A and then cntrl-C to save a copy on the clipboard. Particularly if your comment is long, I would hate for you to lose it.
This entry was posted in Basic and tagged , . Bookmark the permalink.

2 Responses to Understanding Logistic Regression Output:
Part 2 — Which Variables Matter

  1. Harihar Rajaram says:

    Hello

    Thanks for this site. I have a question. In your example, Income has a very low beta_i, i.e. 0.0002 and an odds ratio close to 1. I would view that as suggesting income has practically no influence on the dependent variable.

    But if the z value are p value are analyzed – the p-value is 0.0000 which means that the null hypothesis that coefficient for income = 0 is strongly rejected. In other words, there is very high confidence that coefficient for income not = 0, and in fact 0.0002 is a reliable estimate of the coefficient.

    But, should we not use a combination of the actual coefficient value (being “very” different from 0) or odds ratio (being “very” different from 1) and the p-value to decide which variables are explanatory?

    For example, I see that for income, there is high confidence that the coefficient =0.0002 and odds ratio = 1.0002. I see that as implying income is not an explanatory variable….but the p-value suggests otherwise.

    Thanks for clarifying this,

    Hari

  2. Deana Zainal says:

    hai im Deana.. i just want to ask. what if we only had 1 X-variable? im doing a rainfall data series. thus, i want to know how significantly important no of rain-days towards amount of rainfall

Leave a Reply

Your email address will not be published. Required fields are marked *


ENTER answer below as a NUMBER.
CAREFUL! Error will delete your comment. Save it first (cntrl-A then cntrl-C saves to clipboard). *
Time limit is exhausted. Please reload the CAPTCHA.