Understanding Logistic Regression Output:
Background Part 1 – Least Squares Regression

As I have indicated previously on this web site, I am going to use regular linear least-squares regression as a starting point for explaining logistic regression. Thus, I will be assuming that you have some familiarity with regular linear regression. I will then explain the meaning of the statistical output from logistic regression analysis by drawing parallels to the output from regular regression.

To that end, I am going to start with a brief review of regular linear least-squares regression. If you are currently familiar with least-squares regression, you should skip this series and proceed directly to the discussion of the logistic regression output.

This review of least-squares regression consists of five parts:

  • Part 1 (this article): An example (based on the “Kid Creative” data) that shows the three sections that are always a part of the output from regular least squares regression.
  • Part 2: An outline of the main uses of the regression coefficient table output together with a brief discussion of the first use (determining which X-variables matter).
  • Part 3: Discussion of the second use of the coefficient table: Determining the individual effects of the X-variables.
  • Part 4: A quick review of how to make predictions using the fitted least-squares line. This is another important use of the regression coefficients.
  • Part 5: A discussion of the uncertainty in the regression coefficient estimates as well as a very brief discussion of the uncertainty of predictions.

So again, if you are reasonably familiar with all of these topics in the context of regular linear least-square regression, you should skip this series of articles. That is, if you know what the three parts of a least square regression output are (the coefficient table, the ANOVA, and the goodness of fit section) and you know what the main uses of the regression coefficient table are (determining what variables matter, what their effects are, making predictions, and determining uncertainty), then you should skip to the later articles that described the logistic regression output. (Note: These articles are under development and will appear soon.)

Kid Creative Example: The Regression Variables

I will use the data from the “Kid Creative” example which I have developed previously to provide an example of regular linear least-squares regression. To review the full “set-up” and explanation of the Kid Creative data, click here. Many readers, however, will be able to make sense of this article without reviewing the complete Kid Creative “set up.” But if what follows does not make enough sense, I would suggest going back and reviewing the background material.

In the “Kid Creative” example, the dependent variable for the logistic regression analysis was whether or not the customer buys (coded 0 or 1). This variable, of course, is not appropriate for regular linear least-squares regression because it is a binary variable. So instead I am going to use one of the independent variables, namely household income, as the dependent Y variable in the example I will discuss in this article. Household income is a continuous variable that is appropriate for regular least-squares regression.

Specifically, I am going to regress (using regular least-squares regression) Household Income (rounded to the nearest $1,000) on the following X variables:

  • Gender (IsFemale = 1 if the person is female, 0 otherwise)
  • Marital Status (IsMarried = 1 if married, 0 otherwise)
  • College Educated (HasCollege = 1 if has one or more years of college education, 0 otherwise)
  • Employed in a Profession (IsProfessional = 1 if employed in a profession, 0 otherwise)
  • Retired (IsRetired = 1 if retired, 0 otherwise)
  • Not employed (Unemployed = 1 if not employed, 0 otherwise)
  • Length of Residency in Current City (ResLength; in years)
  • Dual Income if Married (Dual = 1 if dual income, 0 otherwise)
  • Children (Minors = 1 if children under 18 are in the household, 0 otherwise)
  • Home ownership (Own = 1 if own residence, 0 otherwise)
  • Residence Type (House = 1 if a house or townhouse, 0 otherwise)
  • Race (White = 1 if race is white, 0 otherwise)
  • Language (English = 1 is the primary language in the household is English, 0 otherwise)
  • Previously purchased a parenting magazine (PrevParent = 1 if previously purchased a parenting magazine, 0 otherwise).
  • Previously purchased a children’s magazine (PrevChild = 1 if previously purchased a children’s magazine, 0 otherwise)

The Kid Creative Example: The Regression Output

The output of regular linear least-squares regression programs always has three sections:

  1. A table of the regression coefficients and related statistics.
  2. “Goodness of fit” information (that is, how well the estimated regression model fits the data).
  3. An Analysis of Variance (or ANOVA) table.

In regular linear least-squares regression, the coefficient table always includes the coefficient estimates (the \hat\beta‘s) and p-values. The Goodness of Fit section always includes the R^2. The ANOVA table always includes an F-statistic and its p-value which is used to test whether or not the regression is finding any useful information in the X-variables.

I am going to use MS Excel to compute the usual least-squares regression fit. All regression programs are basically the same as far as the output goes, so it really does not matter that I have selected Excel. I have chosen to use Excel here because I expect that essentially everyone reading this will have seen Excel regression output before.

Here is the output from Excel for the regression of Household Income on the X variables:

Note that the Excel regression output has exactly the three sections described above. For the Excel output, the first section is the “goodness of fit” section that includes the R^2. Next comes the ANOVA table with the F-statistic and its p-value. Finally, this is followed by the regression coefficient table, which includes the estimated regression coefficients (the “betas”), their standard errors, t-statistics, and the corresponding p-values.

In Part 2 of this series, I will focus on what is probably the most important part of the regression output — the coefficient table — and will discuss its main uses. Click here to proceed to Part 2.

Questions or Comments?

Any questions or comments? Please feel free to comment below. I am always wanting to improve this material.

I have had to implement a very simple “captcha” field because of spam comments, so be a bit careful about that. Enter your answer as a number (not a word). Also, save your comment before you submit it in case you make a mistake. You can use cntrl-A and then cntrl-C to save a copy on the clipboard. Particularly if your comment is long, I would hate for you to lose it.

This entry was posted in Background and tagged , . Bookmark the permalink.

2 Responses to Understanding Logistic Regression Output:
Background Part 1 – Least Squares Regression

  1. Martin Cohen says:

    I am not familiar with regression values or anywhere else. Could you give a source for a qualitative explanation of what they mean?

Leave a Reply

Your email address will not be published. Required fields are marked *

ENTER answer below as a NUMBER.
CAREFUL! Error will delete your comment. Save it first (cntrl-A then cntrl-C saves to clipboard). *
Time limit is exhausted. Please reload the CAPTCHA.