You can use the following methods to calculate the standard deviation of values in a data frame in dplyr:
Method 1: Calculate Standard Deviation of One Variable
library(dplyr) df %>% summarise(sd_var1 = sd(var1, na.rm=TRUE))
Method 2: Calculate Standard Deviation of Multiple Variables
library(dplyr) df %>% summarise(sd_var1 = sd(var1, na.rm=TRUE), sd_var2 = sd(var2, na.rm=TRUE))
Method 3: Calculate Standard Deviation of Multiple Variables, Grouped by Another Variable
library(dplyr) df %>% group_by(var3) %>% summarise(sd_var1 = sd(var1, na.rm=TRUE), sd_var2 = sd(var2, na.rm=TRUE))
This tutorial explains how to use each method in practice with the following data frame in R:
#create data frame df frame(team=c('A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'), points=c(12, 15, 18, 22, 14, 17, 29, 35), assists=c(4, 4, 3, 6, 7, 8, 3, 10)) #view data frame df team points assists 1 A 12 4 2 A 15 4 3 A 18 3 4 A 22 6 5 B 14 7 6 B 17 8 7 B 29 3 8 B 35 10
Example 1: Calculate Standard Deviation of One Variable
The following code shows how to calculate the standard deviation of the points variable:
library(dplyr) #calculate standard deviation of points variable df %>% summarise(sd_points = sd(points, na.rm=TRUE)) sd_points 1 7.995534
From the output we can see that the standard deviation of values for the points variable is 7.995534.
Example 2: Calculate Standard Deviation of Multiple Variables
The following code shows how to calculate the standard deviation of the points and the assists variables:
library(dplyr) #calculate standard deviation of points and assists variables df %>% summarise(sd_points = sd(points, na.rm=TRUE), sd_assists = sd(assists, na.rm=TRUE)) sd_points sd_assists 1 7.995534 2.559994
The output displays the standard deviation for both the points and assists variables.
Example 3: Calculate Standard Deviation of Multiple Variables, Grouped by Another Variable
The following code shows how to calculate the standard deviation of the points and the assists variables:
library(dplyr) #calculate standard deviation of points and assists variables df %>% group_by(team) %>% summarise(sd_points = sd(points, na.rm=TRUE), sd_assists = sd(assists, na.rm=TRUE)) # A tibble: 2 x 3 team sd_points sd_assists 1 A 4.27 1.26 2 B 9.91 2.94
The output displays the standard deviation for both the points and assists variables for team A and team B.
Note: You can include a list of several variables in the group_by() function if you would like to group by multiple variables.
Additional Resources
The following tutorials explain how to perform other common tasks in R:
How to Filter for Unique Values Using dplyr
How to Filter by Multiple Conditions Using dplyr
How to Count Number of Occurrences in Columns in R