Note: If you want to jump directly to the “punch line” skipping all of my explanation and development (and making me feel unappreciated :() click here.
Recall that in regular least squares regression we fit a line to the data. More technically, what we do is model the expected value (average value) of the dependent variable Y as a linear function of the explanatory X-variables. That is, we model by the linear equation
In the last article (click here to review it), I explained why this equation makes no sense when your dependent Y variable is binary (only takes the values of 0 or 1).
So what do we do? We would still very much like to be dealing with a linear function of the X‘s because linear functions are relatively simple and interpretable. Thus, we would like to keep the right-hand side of our equation the same as in the least-squares case:
Perhaps we can fix things by applying some kind of function to the left-hand side of the equation so that it makes sense as a model for when is binary. That is, maybe we can find a function f( ) so that
This can, in fact, be done. The special function f( ) we use is called the logistic function (or logistic transform):
I have used p as the argument to the logistic function because this function takes on values between 0 and 1 (like a probability). Also, note that the log function here is the natural log (log to the base e). (Note: for more discussion about logarithms and the notation log( ) see the next post by clicking here.)
The next thing that I need to point out is that, when Y is a binary variable (taking values of 0 or 1 only), , where p is the probability that Y takes the value 1. In words what this says is that the expecte value of Y (that is, the average value) is the probability that Y is 1. If you do not see why , accept it as a fact for now. I will explain it in a later article.
So putting all of this together (the punch line!), the key equation (usually termed the “multivariate logistic regression equation” or “multivariate logistic regression model”) that we fit to our data is
where, is the probability that is 1. Since is called the “odds” (more about odds here), what logistic regression does is model the log odds as a linear function of the X-variables.
For completeness here, I am now going to “undo” the logistic transform and show you the equation for (recall that ). Here it is:
Notice that this equation for is clearly not linear. You may well not understand how I got from the last equation to this one. That is OK. Accept it for now and I will explain it in detail in a later post.
So now I will show you in a graph what the logistic regression equation is doing. The figure below shows the same data appropriate for logistic regression that I used in the post “Why Regular Regression Does NOT Work” (click here to review it), but with the logistic equation fit to the data.
In this figure, the smooth s-shaped trace shows the logistic function that is fit to the binary data. This function is an estimate of the probability that is one. As you can see, the probability that is 1 is very small on the left hand side of the figure. It increases through the middle of the figure and is nearly 1 on the right hand side of the figure.
Just to contrast the logistic regression fit with the regular least-squares regression line, I will now add the least-squares line to the figure.
This figure clearly shows how silly the least-squares line is for this binary data and how well the logistic curve estimates the probability that the dependent variable is 1.
Depending on your mathematical background, the above may seem a bit complicated, confusing, and maybe even mysterious. Don’t worry. I am going to help you out in the next few articles.
As always, if you have any comments or questions please feel free to leave them below.