Home » How to Create a Q-Q Plot in ggplot2 (With Example)

How to Create a Q-Q Plot in ggplot2 (With Example)

by Erma Khan

A Q-Q plot, short for “quantile-quantile” plot, is used to assess whether or not a set of data potentially came from some theoretical distribution.

In most cases, this type of plot is used to determine whether or not a set of data follows a normal distribution.

If the data is normally distributed, the points in a Q-Q plot will lie on a straight diagonal line.

Conversely, if the points deviate significantly from the straight diagonal line, then it’s less likely that the data is normally distributed.

To create a Q-Q plot in ggplot2, you can use the stat_qq() and stat_qq_line() functions as follows:

library(ggplot2)

ggplot(df, aes(sample=y)) +
  stat_qq() + 
  stat_qq_line()

The following examples show how to use this syntax to create a Q-Q plot in two different scenarios.

Example 1: Q-Q Plot for Normal Data

The following code shows how to generate a normally distributed dataset with 200 observations and create a Q-Q plot for the dataset in R:

library(ggplot2)

#make this example reproducible
set.seed(1)

#create some fake data that follows a normal distribution
df frame(y=rnorm(200))

#create Q-Q plot
ggplot(df, aes(sample=y)) +
  stat_qq() + 
  stat_qq_line()

Q-Q plot in ggplot2

We can see that the points lie mostly along the straight diagonal line with some minor deviations along each of the tails.

Based on this plot, we would assume that this set of data is normally distributed.

Note that we could also use the color and size arguments to change the color and size of the points in the plot if we’d like to:

library(ggplot2)

#make this example reproducible
set.seed(1)

#create some fake data that follows a normal distribution
df frame(y=rnorm(200))

#create Q-Q plot
ggplot(df, aes(sample=y)) +
  stat_qq(size=2.5, color='red') + 
  stat_qq_line()

Example 2: Q-Q Plot for Non-Normal Data

The following code shows how to create a Q-Q plot for a dataset that follows an exponential distribution with 200 observations:

#make this example reproducible
set.seed(1)

#create some fake data that follows an exponential distribution
df frame(y=rexp(200, rate=3))

#create Q-Q plot
ggplot(df, aes(sample=y)) +
  stat_qq() + 
  stat_qq_line()

We can see that the points deviate significantly from the straight diagonal line. This is a clear indication that the set of data is not normally distributed.

This should make sense considering we specified that the data should follow an exponential distribution.

Additional Resources

The following tutorials explain how to perform other common tasks in ggplot2:

How to Plot Multiple Lines in ggplot2
How to Plot Mean and Standard Deviation in ggplot2
How to Change Line Colors in ggplot2

Related Posts