You can use the following methods to count duplicates in a pandas DataFrame:
Method 1: Count Duplicate Values in One Column
len(df['my_column'])-len(df['my_column'].drop_duplicates())
Method 2: Count Duplicate Rows
len(df)-len(df.drop_duplicates())
Method 3: Count Duplicates for Each Unique Row
df.groupby(df.columns.tolist(), as_index=False).size()
The following examples show how to use each method in practice with the following pandas DataFrame:
import pandas as pd #create DataFrame df = pd.DataFrame({'team': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'], 'position': ['G', 'G', 'G', 'F', 'G', 'G', 'F', 'F'], 'points': [5, 5, 8, 10, 5, 7, 10, 10]}) #view DataFrame print(df) team position points 0 A G 5 1 A G 5 2 A G 8 3 A F 10 4 B G 5 5 B G 7 6 B F 10 7 B F 10
Example 1: Count Duplicate Values in One Column
The following code shows how to count the number of duplicate values in the points column:
#count duplicate values in points column
len(df['points'])-len(df['points'].drop_duplicates())
4
We can see that there are 4 duplicate values in the points column.
Example 2: Count Duplicate Rows
The following code shows how to count the number of duplicate rows in the DataFrame:
#count number of duplicate rows
len(df)-len(df.drop_duplicates())
2
We can see that there are 2 duplicate rows in the DataFrame.
We can use the following syntax to view these 2 duplicate rows:
#display duplicated rows
df[df.duplicated()]
team position points
1 A G 5
7 B F 10
Example 3: Count Duplicates for Each Unique Row
The following code shows how to count the number of duplicates for each unique row in the DataFrame:
#display number of duplicates for each unique row
df.groupby(df.columns.tolist(), as_index=False).size()
team position points size
0 A F 10 1
1 A G 5 2
2 A G 8 1
3 B F 10 2
4 B G 5 1
5 B G 7 1
The size column displays the number of duplicates for each unique row.
Additional Resources
The following tutorials explain how to perform other common operations in pandas:
How to Drop Duplicate Rows in Pandas
How to Drop Duplicate Columns in Pandas
How to Select Columns by Index in Pandas