Home » Multivariate Adaptive Regression Splines in R

Multivariate Adaptive Regression Splines in R

by Erma Khan

Multivariate adaptive regression splines (MARS) can be used to model nonlinear relationships between a set of predictor variables and a response variable.

This method works as follows:

1. Divide a dataset into k pieces.

2. Fit a regression model to each piece.

3. Use k-fold cross-validation to choose a value for k.

This tutorial provides a step-by-step example of how to fit a MARS model to a dataset in R.

Step 1: Load Necessary Packages

For this example we’ll use the Wage dataset from the ISLR package, which contains the annual wages for 3,000 individuals along with a variety of predictor variables like age, education, race, and more.

Before we fit a MARS model to the data, we’ll load the necessary packages:

library(ISLR)      #contains Wage dataset
library(dplyr)     #data wrangling
library(ggplot2)   #plotting
library(earth)     #fitting MARS models
library(caret)     #tuning model parameters

Step 2: View Data

Next, we’ll view the first six rows of the dataset we’re working with:

#view first six rows of data
head(Wage)

       year age           maritl     race       education             region
231655 2006  18 1. Never Married 1. White    1. =Very Good 2. No       4.255273     70.47602
161300  1. Industrial      1. =Very Good 1. Yes      5.041393     154.68529
11443   2. Information     1. =Very Good 1. Yes      4.845098     127.11574

Step 3: Build & Optimize the MARS Model

Next, we’ll build the MARS model for this dataset and perform k-fold cross-validation to determine which model produces the lowest test RMSE (root mean squared error).

#create a tuning grid
hyper_grid grid(degree = 1:3,
                          nprune = seq(2, 50, length.out = 10) %>%
                          floor())

#make this example reproducible
set.seed(1)

#fit MARS model using k-fold cross-validation
cv_mars earth",
  metric = "RMSE",
  trControl = trainControl(method = "cv", number = 10),
  tuneGrid = hyper_grid)

#display model with lowest test RMSE
cv_mars$results %>%
  filter(nprune==cv_mars$bestTune$nprune, degree =cv_mars$bestTune$degree)    
degree	nprune	RMSE	 Rsquared   MAE	      RMSESD	RsquaredSD   MAESD		
1	12	33.8164  0.3431804  22.97108  2.240394	0.03064269   1.4554

From the output we can see that the model that produced the lowest test MSE was one with only first-order effects (i.e. no interaction terms) and 12 terms. This model produced a root mean squared error (RMSE) of 33.8164.

Note: We used method=”earth” to specify a MARS model. You can find the documentation for this method here.

We can also create a plot to visualize the test RMSE based on the degree and the number of terms:

#display test RMSE by terms and degree
ggplot(cv_mars)

MARS model in R

In practice we would fit a MARS model along with several other types of models like:

We would then compare each model to determine which one lead to the lowest test error and choose that model as the optimal one to use.

The complete R code used in this example can be found here.

Related Posts