Home » How to Remove Duplicate Rows in R (With Examples)

How to Remove Duplicate Rows in R (With Examples)

by Erma Khan

You can use one of the following two methods to remove duplicate rows from a data frame in R:

Method 1: Use Base R

#remove duplicate rows across entire data frame
df[!duplicated(df), ]

#remove duplicate rows across specific columns of data frame
df[!duplicated(df[c('var1')]), ]

Method 2: Use dplyr

#remove duplicate rows across entire data frame 
df %>%
  distinct(.keep_all = TRUE)

#remove duplicate rows across specific columns of data frame
df %>%
  distinct(var1, .keep_all = TRUE)

The following examples show how to use this syntax in practice with the following data frame:

#define data frame
df frame(team=c('A', 'A', 'A', 'B', 'B', 'B'),
                 position=c('Guard', 'Guard', 'Forward', 'Guard', 'Center', 'Center'))

#view data frame
df

  team position
1    A    Guard
2    A    Guard
3    A  Forward
4    B    Guard
5    B   Center
6    B   Center

Example 1: Remove Duplicate Rows Using Base R

The following code shows how to remove duplicate rows from a data frame using functions from base R:

#remove duplicate rows from data frame
df[!duplicated(df), ]

  team position
1    A    Guard
3    A  Forward
4    B    Guard
5    B   Center

The following code shows how to remove duplicate rows from specific columns of a data frame using base R:

#remove rows where there are duplicates in the 'team' column
df[!duplicated(df[c('team')]), ]

  team position
1    A    Guard
4    B    Guard

Example 2: Remove Duplicate Rows Using dplyr

The following code shows how to remove duplicate rows from a data frame using the distinct() function from the dplyr package:

library(dplyr)

#remove duplicate rows from data frame
df %>%
  distinct(.keep_all = TRUE)

  team position
1    A    Guard
2    A  Forward
3    B    Guard
4    B   Center

Note that the .keep_all argument tells R to keep all of the columns from the original data frame.

The following code shows how to use the distinct() function to remove duplicate rows from specific columns of a data frame:

library(dplyr)

#remove duplicate rows from data frame
df %>%
  distinct(team, .keep_all = TRUE)

  team position
1    A    Guard
2    B    Guard

Additional Resources

The following tutorials explain how to perform other common functions in R:

How to Remove Rows in R Based on Condition
How to Remove Rows with NA in One Specific Column in R

Related Posts