In regression analysis, Mallows’ Cp is a metric that is used to pick the best regression model among several potential models.
We can identify the “best” regression model by identifying the model with the lowest Cp value that is close to p+1, where p is the number of predictor variables in the model.
The easiest way to calculate Mallows’ Cp in R is to use the ols_mallows_cp() function from the olsrr package.
The following example shows how to use this function to calculate Mallows’ Cp to pick the best regression model among several potential models in R.
Example: Calculating Mallows’ Cp in R
Suppose we would like to fit three different multiple linear regression models using variables from the mtcars dataset.
The following code shows how to fit the following regression models:
- Predictor variables in Full Model: All 10 variables
- Predictor variables in Model 1: disp, hp, wt, qsec
- Predictor variables in Model 2: disp, qsec
- Predictor variables in Model 3: disp, wt
The following code shows how to fit each of these regression models and use the ols_mallows_cp() function to calculate the Mallows’ Cp of each model:
library(olsrr) #fit full model full_model #fit three smaller models model1 #calculate Mallows' Cp for each model ols_mallows_cp(model1, full_model) [1] 4.430434 ols_mallows_cp(model2, full_model) [1] 18.64082 ols_mallows_cp(model3, full_model) [1] 9.122225
Here’s how to interpret the output:
- Model 1: p + 1 = 5, Mallows’ Cp = 4.43
- Model 2: p + 1 = 3, Mallows’ Cp = 18.64
- Model 3: p + 1 = 30, Mallows’ Cp = 9.12
We can see that model 1 has a value for Mallows’ Cp that is closest to p + 1, which indicates that it’s the best model that leads to the least amount of bias among the three potential models.
Notes on Mallows’ Cp
Here are few things to keep in mind with regards to Mallows’ Cp:
- If every potential model has a high value for Mallows’ Cp, this is an indication that some important predictor variables are likely missing from each model.
- If several potential models have low values for Mallow’s Cp, choose the model with the lowest value as the best model to use.
Keep in mind that Mallows’ Cp is only one way to identify the “best” regression model among several potential models.
Another commonly used metric is adjusted R-squared, which tells us the proportion of variance in the response variable that can be explained by the predictor variables in the model, adjusted for the number of predictor variables used.
When deciding which regression model is best among a list of several different models, it’s recommended to look at both Mallows’ Cp and adjusted R-squared.
Additional Resources
How to Calculate Adjusted R-Squared in R
How to Calculate AIC in R