A pairs plot is a matrix of scatterplots that lets you understand the pairwise relationship between different variables in a dataset.
Fortunately it’s easy to create a pairs plot in R by using the pairs() function. This tutorial provides several examples of how to use this function in practice.
Example 1: Pairs Plot of All Variables
The following code illustrates how to create a basic pairs plot for all variables in a data frame in R:
#make this example reproducible set.seed(0) #create data frame var1 #create pairs plot pairs(df)
The way to interpret the matrix is as follows:
- The variable names are shown along the diagonals boxes.
- All other boxes display a scatterplot of the relationship between each pairwise combination of variables. For example, the box in the top right corner of the matrix displays a scatterplot of values for var1 and var3. The box in the middle left displays a scatterplot of values for var1 and var2, and so on.
This single plot gives us an idea of the relationship between each pair of variables in our dataset. For example, var1 and var2 seem to be positively correlated while var1 and var3 seem to have little to no correlation.
Example 2: Pairs Plot of Specific Variables
The following code illustrates how to create a basic pairs plot for just the first two variables in a dataset:
#create pairs plot for var1 and var2 only
pairs(df[, 1:2])
Example 3: Modify the Aesthetics of a Pairs Plot
The following code illustrates how to modify the aesthetics of a pairs plot, including the title, the color, and the labels:
pairs(df, col = 'blue', #modify color labels = c('First', 'Second', 'Third'), #modify labels main = 'Custom Title') #modify title
Example 4: Obtaining Correlations with ggpairs
You can also obtain the Pearson correlation coefficient between variables by using the ggpairs() function from the GGally library. The following code illustrates how to use this function:
#install necessary libraries install.packages('ggplot2') install.packages('GGally') #load libraries library(ggplot2) library(GGally) #create pairs plot ggpairs(df)
The way to interpret this matrix is as follows:
- The variable names are displayed on the outer edges of the matrix.
- The boxes along the diagonals display the density plot for each variable.
- The boxes in the lower left corner display the scatterplot between each variable.
- The boxes in the upper right corner display the Pearson correlation coefficient between each variable. For example, the correlation between var1 and var2 is 0.425.
The benefit of using ggpairs() over the base R function pairs() is that you can obtain more information about the variables. Specifically, you can see the correlation coefficient between each pairwise combination of variables as well as a density plot for each individual variable.
You can find the complete documentation for the ggpairs() function here.