Researchers often take samples from a population and use the data from the sample to draw conclusions about the population as a whole.
One commonly used sampling method is systematic sampling, which is implemented with a simple two step process:
1. Place each member of a population in some order.
2. Choose a random starting point and select every nth member to be in the sample.
This tutorial explains how to perform systematic sampling in R.
Example: Systematic Sampling in R
Suppose a superintendent wants to obtain a sample of 100 students from a school that has 500 total students. She chooses to use systematic sampling in which she places each student in alphabetical order according to their last name, randomly chooses a starting point, and picks every 5th student to be in the sample.
The following code shows how to create a fake data frame to work with in R:
#make this example reproducible set.seed(1) #create simple function to generate random last names randomNames function(n = 5000) { do.call(paste0, replicate(5, sample(LETTERS, n, TRUE), FALSE)) } #create data frame df #view first six rows of data frame head(df) last_name gpa 1 GONBW 82.19580 2 JRRWZ 85.10598 3 ORJFW 88.78065 4 XRYNL 85.94409 5 FMDCE 79.38993 6 XZBJC 80.49061
And the following code shows how to obtain a sample of 100 students through systematic sampling:
#define function to obtain systematic sample obtain_sys = function(N,n){ k = ceiling(N/n) r = sample(1:k, 1) seq(r, r + k*(n-1), k) } #obtain systematic sample sys_sample_df = df[obtain_sys(nrow(df), 100), ] #view first six rows of data frame head(sys_sample_df) last_name gpa 3 ORJFW 88.78065 8 RWPSB 81.96988 13 RACZU 79.21433 18 ZOHKA 80.47246 23 QJETK 87.09991 28 JTHWB 83.87300 #view dimensions of data frame dim(sys_sample_df) [1] 100 2
Notice that the first member included in the sample was in row 3 of the original data frame. Each subsequent member in the sample is located 5 rows after the previous member.
And from using dim() we can see that the systematic sample we obtained is a data frame with 100 rows and 2 columns.
Additional Resources
Types of Sampling Methods
Stratified Sampling in R
Cluster Sampling in R