Home » How to Calculate Standard Deviation Using dplyr (With Examples)

How to Calculate Standard Deviation Using dplyr (With Examples)

by Erma Khan

You can use the following methods to calculate the standard deviation of values in a data frame in dplyr:

Method 1: Calculate Standard Deviation of One Variable

library(dplyr)

df %>%
  summarise(sd_var1 = sd(var1, na.rm=TRUE))

Method 2: Calculate Standard Deviation of Multiple Variables

library(dplyr)

df %>%
  summarise(sd_var1 = sd(var1, na.rm=TRUE),
            sd_var2 = sd(var2, na.rm=TRUE))

Method 3: Calculate Standard Deviation of Multiple Variables, Grouped by Another Variable

library(dplyr)

df %>%
  group_by(var3) %>%
  summarise(sd_var1 = sd(var1, na.rm=TRUE),
            sd_var2 = sd(var2, na.rm=TRUE))

This tutorial explains how to use each method in practice with the following data frame in R:

#create data frame
df frame(team=c('A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'),
                 points=c(12, 15, 18, 22, 14, 17, 29, 35),
                 assists=c(4, 4, 3, 6, 7, 8, 3, 10))

#view data frame
df

  team points assists
1    A     12       4
2    A     15       4
3    A     18       3
4    A     22       6
5    B     14       7
6    B     17       8
7    B     29       3
8    B     35      10

Example 1: Calculate Standard Deviation of One Variable

The following code shows how to calculate the standard deviation of the points variable:

library(dplyr)

#calculate standard deviation of points variable
df %>%
  summarise(sd_points = sd(points, na.rm=TRUE))

  sd_points
1  7.995534

From the output we can see that the standard deviation of values for the points variable is 7.995534.

Example 2: Calculate Standard Deviation of Multiple Variables

The following code shows how to calculate the standard deviation of the points and the assists variables:

library(dplyr)

#calculate standard deviation of points and assists variables
df %>%
  summarise(sd_points = sd(points, na.rm=TRUE),
            sd_assists = sd(assists, na.rm=TRUE))

  sd_points sd_assists
1  7.995534   2.559994

The output displays the standard deviation for both the points and assists variables.

Example 3: Calculate Standard Deviation of Multiple Variables, Grouped by Another Variable

The following code shows how to calculate the standard deviation of the points and the assists variables:

library(dplyr)

#calculate standard deviation of points and assists variables
df %>%
  group_by(team) %>%
  summarise(sd_points = sd(points, na.rm=TRUE),
            sd_assists = sd(assists, na.rm=TRUE))

# A tibble: 2 x 3
  team  sd_points sd_assists
             
1 A          4.27       1.26
2 B          9.91       2.94

The output displays the standard deviation for both the points and assists variables for team A and team B.

Note: You can include a list of several variables in the group_by() function if you would like to group by multiple variables.

Additional Resources

The following tutorials explain how to perform other common tasks in R:

How to Filter for Unique Values Using dplyr
How to Filter by Multiple Conditions Using dplyr
How to Count Number of Occurrences in Columns in R

Related Posts