Understanding Logistic Regression Output:
Part 1 — The Coefficient Table

As with least-squares linear regression, the most important part of the output of a logistic regression is the regression coefficient table. In this five-part series of articles, which parallels and references the five-part series reviewing the least-squares regression coefficient table, I will discuss the logistic regression coefficient table and its interpretation and uses.

Specifically, the five parts of this series are as follows:

  • Part 1 (this article): This part shows the output from a logistic regression based on the KidCreative dataset that I have discussed in a previous post (What a Multivariate Logistic Regression Data Set Looks Like: An Example).
  • Part 2 discusses how to determine which X-variables matter.
  • Part 3 examines the effects of the X-variables.
  • Part 4 shows how to use the fitted logistic regression equation to make predictions.
  • Part 5 discusses the uncertainty in the estimated logistic regression coefficients and briefly discusses uncertainty in predictions.

The KidCreative Logistic Regression

The KidCreative dataset (What a Multivariate Logistic Regression Data Set Looks Like: An Example) will serve as the example that I use to explain the logistic regression output and its interpretation. In the KidCreative example, we are trying to predict the probability that a customer will respond to an e-mail ad and buy a children’s magazine called “Kid Creative.” We have run an experiment and collected 673 observations where a customer was shown the Kid Creative ad. For each of these obervations, we have recorded whether or not the customer buys together with a set of explanatory X-variables. Since the dependent Y variable is binary, logistic regression is appropriate.

The coefficient table from the logistic regression output is shown below:

Since logistic regression is based on an equation that models the log odds as a linear function of the X‘s, the equation that has been fit to the data is:

    \[\begin{split} \log\left( p \over 1-p \right) =& -17.91 + 0.000202 \times \text{Income} + 1.646 \times \text{IsFemale} \\ &+ 0.5662 \times \text{IsMarried} - 0.2794 \times \text{HasCollege}\\ &+ 0.2253 \times \text{IsProfessional} - 1.159 \times \text{IsRetired}\\ &+ 0.9886 \times \text{Unemployed} + 0.02468 \times \text{ResidenceLength}\\ &+ 0.4518 \times \text{DualIncome} + 1.133 \times \text{Minors}\\ &+ 1.056 \times \text{Own} - 0.9265 \times \text{House}\\ &+ 1.864 \times \text{White} + 1.53 \times \text{English}\\ &+ 1.557 \times \text{PrevChildMag} + 0.4777 \times \text{PrevParentMag} \end{split}\]

Right now, I do not want to focus on the logistic regression equation (I return to this in a later post), but rather want to consider the form of the logistic regression coefficient table.

Notice how similar logistic regression coefficient table is to the coefficient table for least-square regression (click here to pull up our previous least-squares example):

  • Both tables list the names of the X-variables on the left
  • Both tables give the estimated values of the regression coefficients as the first numerical column.
  • Both tables give estimates of the standard errors of the regression coefficient estimates in the next column.
  • In the third numerical column, both tables compute the value of a statistic that is used to compute the p-value for the coefficient. In the logistic regression output here, the statistic is called a z-value. In the least-squares regression output, it is called a “t Stat,” but both statistics are serving the same purpose.
  • Both tables then compute the regression coefficients’ p-values. In logistic regression output shown here, the p-value is denoted by the probability notation Pr(>|z|). In the least squares regression output it was labeled “P-value.” But both columns are serving exactly the same purpose.
  • The only real difference in the two tables is that the logistic regression includes an additional column call the “Odds Ratio.” Note that not all logistic regression programs output the Odds Ratios in the coefficient table (they should), but it is always available somewhere in the output, and virtually all published results (in academic journals, for example) include the odd-ratios in the coefficient table.

In my brief review of least-squares regression, I outlined the four main uses of the regression coefficient table:

  1. Determining what variable matter
  2. Assessing the impact of the X-variables
  3. Making predictions
  4. Assessing uncertainty

Only one of these is very different from least-squares regression, namely assessing the impact of the X-variables. I will now discuss each on of the four uses of the regression coefficient table in the articles that follow.

Questions or Comments?

Any questions or comments? Please feel free to comment below. I am always wanting to improve this material.

I have had to implement a very simple “captcha” field because of spam comments, so be a bit careful about that. Enter your answer as a number (not a word). Also, save your comment before you submit it in case you make a mistake. You can use cntrl-A and then cntrl-C to save a copy on the clipboard. Particularly if your comment is long, I would hate for you to lose it.

This entry was posted in Basic. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *


four − = 3

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>