You can use the ntile() function from the dplyr package in R to break up an input vector into n buckets.
This function uses the following basic syntax:
ntile(x, n)
where:
- x: Input vector
- n: Number of buckets
Note: The size of the buckets can differ by up to one.
The following examples show how to use this function in practice.
Example 1: Use ntile() with a Vector
The following code shows how to use the ntile() function to break up a vector with 11 elements into 5 different buckets:
library(dplyr) #create vector x #break up vector into 5 buckets ntile(x, 5) [1] 1 1 1 2 2 3 3 4 4 5 5
From the output we can see that each element from the original vector has been placed into one of five buckets.
The smallest values are assigned to bucket 1 while the largest values are assigned to bucket 5.
For example:
- The smallest values of 1, 3, and 4 are assigned to bucket 1.
- The largest values of 22 and 23 are assigned to bucket 5.
Example 2: Use ntile() with a Data Frame
Suppose we have the following data frame in R that shows the points scored by various basketball players:
#create data frame df frame(player=LETTERS[1:9], points=c(12, 19, 7, 22, 24, 28, 30, 19, 15)) #view data frame df player points 1 A 12 2 B 19 3 C 7 4 D 22 5 E 24 6 F 28 7 G 30 8 H 19 9 I 15
The following code shows how to use the ntile() function to create a new column in the data frame that assigns each player into one of three buckets, depending on their points scored:
library(dplyr) #create new column that assigns players into buckets based on points df$bucket #view updated data frame df player points bucket 1 A 12 1 2 B 19 2 3 C 7 1 4 D 22 2 5 E 24 3 6 F 28 3 7 G 30 3 8 H 19 2 9 I 15 1
The new bucket column assigns a value between 1 and 3 to each player.
The players with the lowest points receive a value of 1 and the players with the highest points receive a value of 3.
Additional Resources
The following tutorials explain how to use other common functions in R:
How to Use the across() Function in dplyr
How to Use the relocate() Function in dplyr
How to Use the slice() Function in dplyr