To select a random sample in R we can use the sample() function, which uses the following syntax:
sample(x, size, replace = FALSE, prob = NULL)
where:
- x: A vector of elements from which to choose.
- size: Sample size.
- replace: Whether to sample with replacement or not. Default is FALSE.
- prob: Vector of probability weights for obtaining elements from vector. Default is NULL.
This tutorial explains how to use this function to select a random sample in R from both a vector and a data frame.
Example 1: Random Sample from a Vector
The following code shows how to select a random sample from a vector without replacement:
#create vector of data
data #select random sample of 5 elements without replacement
sample(x=data, size=5)
[1] 10 12 5 14 7
The following code shows how to select a random sample from a vector with replacement:
#create vector of data data #select random sample of 5 elements with replacement sample(x=data, size=5, replace=TRUE) [1] 12 1 1 6 14
Example 2: Random Sample from a Data Frame
The following code shows how to select a random sample from a data frame:
#create data frame df #view data frame df x y z 1 3 12 2 2 5 6 7 3 6 4 8 4 6 23 8 5 8 25 15 6 12 8 17 7 14 9 29 #select random sample of three rows from data frame rand_df sample(nrow(df), size=3), ] #display randomly selected rows rand_df x y z 4 6 23 8 7 14 9 29 1 3 12 2
Here’s what’s happening in this bit of code:
1. To select a subset of a data frame in R, we use the following syntax: df[rows, columns]
2. In the code above, we randomly select a sample of 3 rows from the data frame and all columns.
3. The end result is a subset of the data frame with 3 randomly selected rows.
It’s important to note that each time we use the sample() function, R will select a different sample since the function chooses values randomly.
In order to replicate the results of some analysis, be sure to use set.seed(some number) so that the sample() function chooses the same random sample each time. For example:
#make this example reproducible set.seed(23) #create data frame df #select random sample of three rows from data frame rand_df sample(nrow(df), size=3), ] #display randomly selected rows rand_df x y z 5 8 25 15 2 5 6 7 6 12 8 17
Each time you run the above code, the same 3 rows of the data frame will be selected each time.
Additional Resources
Stratified Sampling in R (With Examples)
Systematic Sampling in R (With Examples)
Cluster Sampling in R (With Examples)