Home » How to Drop Duplicate Columns in Pandas (With Examples)

How to Drop Duplicate Columns in Pandas (With Examples)

by Erma Khan

You can use the following basic syntax to drop duplicate columns in pandas:

df.T.drop_duplicates().T

The following examples show how to use this syntax in practice.

Example: Drop Duplicate Columns in Pandas

Suppose we have the following pandas DataFrame:

import pandas as pd

#create DataFrame with duplicate columns
df = pd.DataFrame({'team': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'],
                   'points': [25, 12, 15, 14, 19, 23, 25, 29],
                   'assists': [25, 12, 15, 14, 19, 23, 25, 29],
                   'rebounds': [11, 8, 10, 6, 6, 5, 9, 12]})

df.columns = ['team', 'points', 'points', 'rebounds']

#view DataFrame
df

	team	points	points	rebounds
0	A	25	25	11
1	A	12	12	8
2	A	15	15	10
3	A	14	14	6
4	B	19	19	6
5	B	23	23	5
6	B	25	25	9
7	B	29	29	12

We can use the following code to remove the duplicate ‘points’ column:

#remove duplicate columns
df.T.drop_duplicates().T

        team	points	rebounds
0	A	25	11
1	A	12	8
2	A	15	10
3	A	14	6
4	B	19	6
5	B	23	5
6	B	25	9
7	B	29	12

Notice that the ‘points’ column has been removed while all other columns remained in the DataFrame.

It’s also worth noting that this code will remove duplicate columns even if the columns have different names, yet contain identical values.

For example, suppose we have the following pandas DataFrame:

import pandas as pd

#create DataFrame with duplicate columns
df = pd.DataFrame({'team': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'],
                   'points': [25, 12, 15, 14, 19, 23, 25, 29],
                   'points2': [25, 12, 15, 14, 19, 23, 25, 29],
                   'rebounds': [11, 8, 10, 6, 6, 5, 9, 12]})

#view DataFrame
df

	team	points	points2	rebounds
0	A	25	25	11
1	A	12	12	8
2	A	15	15	10
3	A	14	14	6
4	B	19	19	6
5	B	23	23	5
6	B	25	25	9
7	B	29	29	12

Notice that the ‘points’ and ‘points2’ columns contain identical values.

We can use the following code to remove the duplicate ‘points2’ column:

#remove duplicate columns
df.T.drop_duplicates().T

        team	points	rebounds
0	A	25	11
1	A	12	8
2	A	15	10
3	A	14	6
4	B	19	6
5	B	23	5
6	B	25	9
7	B	29	12

Additional Resources

The following tutorials explain how to perform other common functions in pandas:

How to Drop Duplicate Rows in a Pandas DataFrame
How to Drop Columns in Pandas
How to Exclude Columns in Pandas

Related Posts