Regression models are used to quantify the relationship between one or more predictor variables and a response variable.
Whenever we fit a regression model, we want to understand how well the model is able to use the values of the predictor variables to predict the value of the response variable.
Two metrics we often use to quantify how well a model fits a dataset are the mean absolute error (MAE) and the root mean squared error (RMSE), which are calculated as follows:
MAE: A metric that tells us the mean absolute difference between the predicted values and the actual values in a dataset. The lower the MAE, the better a model fits a dataset.
MAE = 1/n * Σ|yi – ŷi|
where:
- Σ is a symbol that means “sum”
- yi is the observed value for the ith observation
- ŷi is the predicted value for the ith observation
- n is the sample size
RMSE: A metric that tells us the square root of the average squared difference between the predicted values and the actual values in a dataset. The lower the RMSE, the better a model fits a dataset.
It is calculated as:
RMSE = √Σ(yi – ŷi)2 / n
where:
- Σ is a symbol that means “sum”
- ŷi is the predicted value for the ith observation
- yi is the observed value for the ith observation
- n is the sample size
Example: Calculating RMSE & MAE
Suppose we use a regression model to predict the number of points that 10 players will score in a basketball game.
The following table shows the predicted points from the model vs. the actual points the players scored:
Using the MAE Calculator, we can calculate the MAE to be 3.2
This tells us that the mean absolute difference between the predicted values made by the model and the actual values is 3.2.
Using the RMSE Calculator, we can calculate the RMSE to be 4.
This tells us that the square root of the average squared differences between the predicted points scored and the actual points scored is 4.
Notice that each metric gives us an idea of the typical difference between the predicted value made by the model and the actual value in the dataset, but the interpretation of each metric is slightly different.
RMSE vs. MAE: Which Metric Should You Use?
If you would like to give more weights to observations that are further from the mean (i.e. if being “off” by 20 is more than twice as bad as being off by 10″) then it’s better to use the RMSE to measure error because the RMSE is more sensitive to observations that are further from the mean.
However, if being “off” by 20 is twice as bad as being off by 10 then it’s better to use the MAE.
To illustrate this, suppose we have one player who is a clear outlier in their number of points scored:
Using the online calculators mentioned earlier, we can calculate the MAE and RMSE to be:
- MAE: 8
- RMSE: 16.4356
Notice that the RMSE increases much more than the MAE.
This is because RMSE uses squared differences in its formula and the squared difference between the observed value of 76 and the predicted value of 22 is quite large. This causes the value for RMSE to increase significantly.
In practice, we typically fit several regression models to a dataset and calculate just one of these metrics for each model.
For example, we might fit three different regression models and calculate the RMSE for each model. We would then select the model with the lowest RMSE value as the “best” model because it is the one that makes predictions that are closest to the actual values from the dataset.
In either case, just make sure to calculate the same metric for each model. For example, don’t calculate MAE for one model and RMSE for another model and then compare those two metrics.
Additional Resources
The following tutorials explain how to calculate MAE using different statistical software:
How to Calculate Mean Absolute Error in Excel
How to Calculate Mean Absolute Error in R
How to Calculate Mean Absolute Error in Python
The following tutorials explain how to calculate RMSE using different statistical software:
How to Calculate Root Mean Square Error in Excel
How to Calculate Root Mean Square Error in R
How to Calculate Root Mean Square Error in Python