This article discusses making predictions using logistic regression. It is part 4 of a five-part series focused on understanding and interpreting the logistic regression coefficient output table. Click here to see a brief summary of the focus of each of the 5 parts.
Coefficient Table Use #3: Making Predictions
Making predictions in logistic regression is very similar to making predictions in least-squares regression (click here for a review of prediction in least-squares regression). All you do is to plug the -variables into the logistic regression equation specified by the regression coefficients estimates that you get from the coefficient table output. There are two differences, however. First, the prediction given by logistic regression is of the probability that the dependent variable is 1 (that is, ). This is slightly different from predicting the value of since is either 0 or 1, and the probability will be a number in the middle like, for example, 0.75. Second, the prediction equation in logistic regression is more complicated than for regular linear least-squares regression.
As discussed previously, logistic regression fits a linear equation to the log odds:
Here are the estimates of the regression coefficients from the coefficient table output. So if we want to calculate directly, we need to solve this equation for . The solution is:
I am not going to derive this equation here, but if you want you want all of the details, you can see them here. Otherwise, just accept that this equation does show how to calculate the predicted probability from the the -values together with the estimates of the regression coefficients.
I am now going to show you an example using the “Kid Creative” data. First, we will need the regression coefficients from the coefficient output table shown below (from the column labeled “Estimate”): Using these estimated logistic regression coefficients, the prediction equation is
Now suppose I wanted to predict the probability that the following person buys the Kid Creative magazine subscription
- Income: Income = 58000
- Gender Female: IsFemale = 1
- Married: IsMarried = 1
- College Educated: HasCollege = 1
- Not a Professional: IsProfessional = 0
- Not Retired: IsRetired = 0
- Employed: Unemployed = 0
- Eight years of Residency in Current City: ResLength = 8
- Dual Income: Dual = 1
- Does not have Children: Minors = 0
- Owns Home: Own = 1
- Lives in a house: House = 1
- Race is white: White = 1
- First language is English: English = 1
- Has not previously purchased a children’s magazine: PrevChild = 0
- Has previously purchased a parenting magazine PrevParent = 1
It turns out that the variable values that I used in making this prediction correspond to observation number 184 in the KidCreative data set. This particular person happened to buy the magazine, but with the odds of buying around 60-40 (if the prediction is correct), it certainly could have gone the other way.
So now I have explained how to use the output from logistic regression to make predictions about the probability of “success.” Such predictions are extremely useful as they are often the key “ingredient” in many “data mining,” machine learning, marketing analytics, and other “big data” problems where the analysis and prediction is automated. But, of course, such predictions are also very useful in “small data” problems as well.
In the final part of this five-part series, I will discuss assessing the uncertainly of the regression coefficients and odds ratios.
Questions or Comments?
Any questions or comments? Please feel free to comment below. I am always wanting to improve this material.
I have had to implement a very simple “captcha” field because of spam comments, so be a bit careful about that. Enter your answer as a number (not a word). Also, save your comment before you submit it in case you make a mistake. You can use cntrl-A and then cntrl-C to save a copy on the clipboard. Particularly if your comment is long, I would hate for you to lose it.