In statistics, regression analysis is a technique that can be used to analyze the relationship between predictor variables and a response variable.
When you use software (like R, Stata, SPSS, etc.) to perform a regression analysis, you will receive a regression table as output that summarize the results of the regression.
Arguably the most important numbers in the output of the regression table are the regression coefficients. Yet, despite their importance, many people have a hard time correctly interpreting these numbers.
This tutorial walks through an example of a regression analysis and provides an in-depth explanation of how to interpret the regression coefficients that result from the regression.
Related: How to Read and Interpret an Entire Regression Table
A Regression Analysis Example
Suppose we are interested in running a regression analysis using the following variables:
Predictor Variables
- Total number of hours studied (continuous variable – between 0 and 20)
- Whether or not a student used a tutor (categorical variable – “yes” or “no”)
Response Variable
- Exam score ( continuous variable – between 1 and 100)
We are interested in examining the relationship between the predictor variables and the response variable to find out if hours studied and whether or not a student used a tutor actually have a meaningful impact on their exam score.
Suppose we run a regression analysis and get the following output:
Term | Coefficient | Standard Error | t Stat | P-value |
---|---|---|---|---|
Intercept | 48.56 | 14.32 | 3.39 | 0.002 |
Hours studied | 2.03 | 0.67 | 3.03 | 0.009 |
Tutor | 8.34 | 5.68 | 1.47 | 0.138 |
Let’s take a look at how to interpret each regression coefficient.
Interpreting the Intercept
The intercept term in a regression table tells us the average expected value for the response variable when all of the predictor variables are equal to zero.
In this example, the regression coefficient for the intercept is equal to 48.56. This means that for a student who studied for zero hours (Hours studied = 0) and did not use a tutor (Tutor = 0), the average expected exam score is 48.56.
It’s important to note that the regression coefficient for the intercept is only meaningful if it’s reasonable that all of the predictor variables in the model can actually be equal to zero. In this example, it’s certainly possible for a student to have studied for zero hours (Hours studied = 0) and to have also not used a tutor (Tutor = 0). Thus, the interpretation for the regression coefficient of the intercept is meaningful in this example.
In some cases, though, the regression coefficient for the intercept is not meaningful. For example, suppose we ran a regression analysis using square footage as a predictor variable and house value as a response variable. In the output regression table, the regression coefficient for the intercept term would not have a meaningful interpretation since square footage of a house can never actually be equal to zero. In that case, the regression coefficient for the intercept term simply anchors the regression line in the right place.
Interpreting the Coefficient of a Continuous Predictor Variable
For a continuous predictor variable, the regression coefficient represents the difference in the predicted value of the response variable for each one-unit change in the predictor variable, assuming all other predictor variables are held constant.
In this example, Hours studied is a continuous predictor variable that ranges from 0 to 20 hours. In some cases, a student studied as few as zero hours and in other cases a student studied as much as 20 hours.
From the regression output, we can see that the regression coefficient for Hours studied is 2.03. This means that, on average, each additional hour studied is associated with an increase of 2.03 points on the final exam, assuming the predictor variable Tutor is held constant.
For example, consider student A who studies for 10 hours and uses a tutor. Also consider student B who studies for 11 hours and also uses a tutor. According to our regression output, student B is expected to receive an exam score that is 2.03 points higher than student A.
The p-value from the regression table tells us whether or not this regression coefficient is actually statistically significant. We can see that the p-value for Hours studied is 0.009, which is statistically significant at an alpha level of 0.05.
Note: The alpha level should be chosen before the regression analysis is conducted – common choices for the alpha level are 0.01, 0.05, and 0.10.
Related post: An Explanation of P-Values and Statistical Significance
Interpreting the Coefficient of a Categorical Predictor Variable
For a categorical predictor variable, the regression coefficient represents the difference in the predicted value of the response variable between the category for which the predictor variable = 0 and the category for which the predictor variable = 1.
In this example, Tutor is a categorical predictor variable that can take on two different values:
- 1 = the student used a tutor to prepare for the exam
- 0 = the student did not used a tutor to prepare for the exam
From the regression output, we can see that the regression coefficient for Tutor is 8.34. This means that, on average, a student who used a tutor scored 8.34 points higher on the exam compared to a student who did not used a tutor, assuming the predictor variable Hours studied is held constant.
For example, consider student A who studies for 10 hours and uses a tutor. Also consider student B who studies for 10 hours and does not use a tutor. According to our regression output, student A is expected to receive an exam score that is 8.34 points higher than student B.
The p-value from the regression table tells us whether or not this regression coefficient is actually statistically significant. We can see that the p-value for Tutor is 0.138, which is not statistically significant at an alpha level of 0.05. This indicates that although students who used a tutor scored higher on the exam, this difference could have been due to random chance.
Interpreting All of the Coefficients At Once
We can use all of the coefficients in the regression table to create the following estimated regression equation:
Expected exam score = 48.56 + 2.03*(Hours studied) + 8.34*(Tutor)
Note: Keep in mind that the predictor variable “Tutor” was not statistically significant at alpha level 0.05, so you may choose to remove this predictor from the model and not use it in the final estimated regression equation.
Using this estimated regression equation, we can predict the final exam score of a student based on their total hours studied and whether or not they used a tutor.
For example, a student who studied for 10 hours and used a tutor is expected to receive an exam score of:
Expected exam score = 48.56 + 2.03*(10) + 8.34*(1) = 77.2
Considering Correlation When Interpreting Regression Coefficients
It’s important to keep in mind that predictor variables can influence each other in a regression model. For example, most predictor variables will be at least somewhat related to one another (e.g. perhaps a student who studies more is also more likely to use a tutor).
This means that regression coefficients will change when different predict variables are added or removed from the model.
One good way to see whether or not the correlation between predictor variables is severe enough to influence the regression model in a serious way is to check the VIF between the predictor variables. This will tell you whether or not the correlation between predictor variables is a problem that should be addressed before you decide to interpret the regression coefficients.
If you are running a simple linear regression model with only one predictor, then correlated predictor variables will not be a problem.