In logistic regression, the Y variable is generally binary. That is, it takes on the values 0 or 1 only. If the original variable was dichotomous (e.g., “yes” or “no”), then the categories are coded as 0 and 1. You get to choose which of the dichotomous categories is coded as 1.
In regression, in addition to your dependent variable (your Y variable), you also have explanatory variables (your X-variables). Your goal is to understand the relationship between the explanatory X-variables and the Y-variable.
For example, you might be interested in factors that influence (or explain) whether or not a person in the U.S. owns a U.S.-made or foreign-made (non-U.S.) car. It would be natural to code owning a U.S. car as a 1 and owning a foreign car as a 0. So the Y-variable in the logistic regression is whether or not a person owns a U.S. car coded as a 1 if he or she does and a 0 otherwise.
You might then want to study how various factors or variables influence whether or not a person owns a foreign car. Variables you might consider are income, age, gender, marital status, children, political affiliation, and so on. These variables are the X-variables in the logistic regression that you will use to try to explain or predict the value of the Y variable. For example, you might want to know if gender matters — are men or women more likely to own U.S. cars.
In logistic regression, the X-variables are used to build a mathematical equation that predicts the probability that the Y-variable takes on a value of 1. Thus, we use logistic regression when it is plausible that whether or not the Y-variable is 0 or 1 is like a flip of a coin where the probability of getting “heads” depends on the X-variables. That is, unlike flipping a regular coin, the probability of getting “heads” is not always 50/50, but rather depends on the values taken by the X-variables.
In summary then, we use logistic regression when:
- We have a binary or dichotomous Y variable.
- We have explanatory X-variables that we think are related to the Y-variable.
- It is reasonable to think that the value the Y-variable takes on is like a coin flip where the probability of getting a 1 (“heads”) depends on the explanatory variables.
Comments or questions are welcome! I want to keep improving this material.