You can use the following basic syntax to drop duplicate columns in pandas:
df.T.drop_duplicates().T
The following examples show how to use this syntax in practice.
Example: Drop Duplicate Columns in Pandas
Suppose we have the following pandas DataFrame:
import pandas as pd #create DataFrame with duplicate columns df = pd.DataFrame({'team': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'], 'points': [25, 12, 15, 14, 19, 23, 25, 29], 'assists': [25, 12, 15, 14, 19, 23, 25, 29], 'rebounds': [11, 8, 10, 6, 6, 5, 9, 12]}) df.columns = ['team', 'points', 'points', 'rebounds'] #view DataFrame df team points points rebounds 0 A 25 25 11 1 A 12 12 8 2 A 15 15 10 3 A 14 14 6 4 B 19 19 6 5 B 23 23 5 6 B 25 25 9 7 B 29 29 12
We can use the following code to remove the duplicate ‘points’ column:
#remove duplicate columns df.T.drop_duplicates().T team points rebounds 0 A 25 11 1 A 12 8 2 A 15 10 3 A 14 6 4 B 19 6 5 B 23 5 6 B 25 9 7 B 29 12
Notice that the ‘points’ column has been removed while all other columns remained in the DataFrame.
It’s also worth noting that this code will remove duplicate columns even if the columns have different names, yet contain identical values.
For example, suppose we have the following pandas DataFrame:
import pandas as pd #create DataFrame with duplicate columns df = pd.DataFrame({'team': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'], 'points': [25, 12, 15, 14, 19, 23, 25, 29], 'points2': [25, 12, 15, 14, 19, 23, 25, 29], 'rebounds': [11, 8, 10, 6, 6, 5, 9, 12]}) #view DataFrame df team points points2 rebounds 0 A 25 25 11 1 A 12 12 8 2 A 15 15 10 3 A 14 14 6 4 B 19 19 6 5 B 23 23 5 6 B 25 25 9 7 B 29 29 12
Notice that the ‘points’ and ‘points2’ columns contain identical values.
We can use the following code to remove the duplicate ‘points2’ column:
#remove duplicate columns df.T.drop_duplicates().T team points rebounds 0 A 25 11 1 A 12 8 2 A 15 10 3 A 14 6 4 B 19 6 5 B 23 5 6 B 25 9 7 B 29 12
Additional Resources
The following tutorials explain how to perform other common functions in pandas:
How to Drop Duplicate Rows in a Pandas DataFrame
How to Drop Columns in Pandas
How to Exclude Columns in Pandas