Often you may want to create a new variable in a data frame in R based on some condition. Fortunately this is easy to do using the mutate() and case_when() functions from the dplyr package.
This tutorial shows several examples of how to use these functions with the following data frame:
#create data frame
df #view data frame
df
player position points rebounds
1 a G 12 5
2 b F 15 7
3 c F 19 7
4 d G 22 12
5 e G 32 11
Example 1: Create New Variable Based on One Existing Variable
The following code shows how to create a new variable called ‘scorer’ based on the value in the points column:
library(dplyr) #define new variable 'scorer' using mutate() and case_when() df %>% mutate(scorer = case_when(points low', points med', points high')) player position points rebounds scorer 1 a G 12 5 low 2 b F 15 7 med 3 c F 19 7 med 4 d G 22 12 med 5 e G 32 11 high
Example 2: Create New Variable Based on Several Existing Variables
The following code shows how to create a new variable called ‘type’ based on the value in the player and position column:
library(dplyr) #define new variable 'type' using mutate() and case_when() df %>% mutate(type = case_when(player == 'a' | player == 'b' ~ 'starter', player == 'c' | player == 'd' ~ 'backup', position == 'G' ~ 'reserve')) player position points rebounds type 1 a G 12 5 starter 2 b F 15 7 starter 3 c F 19 7 backup 4 d G 22 12 backup 5 e G 32 11 reserve
The following code shows how to create a new variable called ‘valueAdded’ based on the value in the points and rebounds columns:
library(dplyr) #define new variable 'valueAdded' using mutate() and case_when() df %>% mutate(valueAdded = case_when(points 5 ~ 4, points 8 ~ 7, points >=25 ~ 9)) player position points rebounds valueAdded 1 a G 12 5 2 2 b F 15 7 4 3 c F 19 7 6 4 d G 22 12 7 5 e G 32 11 9
Additional Resources
How to Rename Columns in R
How to Remove Columns in R
How to Filter Rows in R