A Pareto chart is a type of chart that displays the frequencies of different categories along with the cumulative frequencies of categories.
This tutorial provides a step-by-step example of how to create a Pareto chart in R.
Step 1: Create the Data
Suppose we conduct a survey in which we ask 350 different people to identify their favorite cereal brand between brands A, B, C, D, and E.
The following dataset shows the total votes for each brand:
#create data df frame(favorite=c('A', 'B', 'C', 'D', 'E', 'F'), count=c(140, 97, 58, 32, 17, 6)) #view data df favorite count 1 A 140 2 B 97 3 C 58 4 D 32 5 E 17 6 F 6
Step 2: Create the Pareto Chart
To create a Pareto chart to visualize the results of this survey, we can use the pareto.chart() function from the qcc package:
library(qcc) #create Pareto chart pareto.chart(df$count) Pareto chart analysis for df$count Frequency Cum.Freq. Percentage Cum.Percent. A 140.000000 140.000000 40.000000 40.000000 B 97.000000 237.000000 27.714286 67.714286 C 58.000000 295.000000 16.571429 84.285714 D 32.000000 327.000000 9.142857 93.428571 E 17.000000 344.000000 4.857143 98.285714 F 6.000000 350.000000 1.714286 100.000000
The table in the output shows us the frequency and cumulative frequency of each brand. For example:
- Frequency of brand A: 140 | Cumulative frequency: 140
- Frequency of brand B: 97 | Cumulative frequency of A, B: 237
- Frequency of brand C: 58 | Cumulative frequency of A, B, C: 295
And so on.
Step 3: Modify the Pareto Chart (Optional)
The following code shows how to modify the title of the chart along with the color palette used:
pareto.chart(df$count, main='Pareto Chart for Favorite Cereal Brands', col=heat.colors(length(df$count)))
You can find a complete list of color palettes available in this R Color Cheat Sheet.
Additional Resources
How to Create Frequency Tables in R
How to Create a Relative Frequency Histogram in R
How to Calculate Cumulative Sums in R