Logistic Regression vs. Linear Regression: The Key Differences | Online Statistics library

Two of the most commonly used regression models are linear regression and logistic regression.

Both types of regression models are used to quantify the relationship between one or more predictor variables and a response variable, but there are some key differences between the two models:

logistic regression vs. linear regression

Here’s a summary of the differences:

Difference #1: Type of Response Variable

A linear regression model is used when the response variable takes on a continuous value such as:

Price
Height
Age
Distance

Conversely, a logistic regression model is used when the response variable takes on a categorical value such as:

Yes or No
Male or Female
Win or Not Win

Difference #2: Equation Used

Linear regression uses the following equation to summarize the relationship between the predictor variable(s) and the response variable:

Y = β₀ + β₁X₁ + β₂X₂ + … + β_pX_p

where:

Y: The response variable
X_j: The j^th predictor variable
β_j: The average effect on Y of a one unit increase in X_j, holding all other predictors fixed

Conversely, logistic regression uses the following equation:

p(X) = e^{β₀ + β₁X₁ + β₂X₂ + … + β_pX_p} / (1 + e^{β₀ + β₁X₁ + β₂X₂ + … + β_pX_p})

This equation is used to predict the probability that an individual observation falls into a certain category.

Difference #3: Method Used to Fit Equation

Linear regression uses a method known as ordinary least squares to find the best fitting regression equation.

Conversely, logistic regression uses a method known as maximum likelihood estimation to find the best fitting regression equation.

Difference #4: Output to Predict

Linear regression predicts a continuous value as the output. For example:

Price ($150, $199, $400, etc.)
Height (14 inches, 2 feet, 94.32 centimeters, etc.)
Age (2 months, 6 years, 41.5 years, etc.)
Distance (1.23 miles, 4.5 kilometers, etc.)

Conversely, logistic regression predicts probabilities as the output. For example:

40.3% chance of getting accepted to a university.
93.2% chance of winning a game.
34.2% chance of a law getting passed.

When to Use Logistic vs. Linear Regression

The following practice problems can help you gain a better understanding of when to use logistic regression or linear regression.

Problem #1: Annual Income

Suppose an economist wants to use predictor variables (1) weekly hours worked and (2) years of education to predict the annual income of individuals.

In this scenario, he would use linear regression because the response variable (annual income) is continuous.

Problem #2: University Acceptance

Suppose a college admissions officer wants to use the predictor variables (1) GPA and (2) ACT score to predict the probability that a student will get accepted into a certain university.

In this scenario, she would use logistic regression because the response variable is categorial and can only take on two values – accepted or not accepted.

Problem #3: Home Price

Suppose a real estate agent wants to use the predictor variables (1) square footage, (2) number of bedrooms, and (3) number of bathrooms to predict the selling house of prices.

In this scenario, she would use linear regression because the response variable (price) is continuous.

Problem #4: Spam Detection

Suppose a computer programmer wants to use the predictor variables (1) number of words and (2) country of origin to predict the probability that a given email is spam.

In this scenario, he would use logistic regression because the response variable is categorical and can only take on two values – spam or not spam.

Additional Resources

The following tutorials offer more details on linear regression:

The following tutorials offer more details on logistic regression: