You can use the following basic syntax to merge together columns in a pandas DataFrame that share the same column name:
#define function to merge columns with same names together def same_merge(x): return ','.join(x[x.notnull()].astype(str)) #define new DataFrame that merges columns with same names together df_new = df.groupby(level=0, axis=1).apply(lambda x: x.apply(same_merge, axis=1))
The following example shows how to use this syntax in practice.
Example: Merge Together Columns Sharing Same Name in Pandas
Suppose we have the following pandas DataFrame:
import pandas as pd import numpy as np #create DataFrame df = pd.DataFrame({'A': [5, 6, 8, np.nan, 4, np.nan, np.nan], 'A1': [np.nan, 12, np.nan, 10, np.nan, 6, 4], 'B': [2, 7, np.nan, np.nan, 2, 4, np.nan], 'B1': [5, np.nan, 6, 15, 1, np.nan, 4]}) #rename columns so there are duplicate column names df.columns = ['A', 'A', 'B', 'B'] #view DataFrame print(df) A A B B 0 5.0 NaN 2.0 5.0 1 6.0 12.0 7.0 NaN 2 8.0 NaN NaN 6.0 3 NaN 10.0 NaN 15.0 4 4.0 NaN 2.0 1.0 5 NaN 6.0 4.0 NaN 6 NaN 4.0 NaN 4.0
Notice that two columns have a name of ‘A’ and two columns have a name of ‘B.’
We can use the following code to merge the columns that have the same column names and concatenate their values together with a comma:
#define function to merge columns with same names together def same_merge(x): return ','.join(x[x.notnull()].astype(str)) #define new DataFrame that merges columns with same names together df_new = df.groupby(level=0, axis=1).apply(lambda x: x.apply(same_merge, axis=1)) #view new DataFrame print(df_new) A B 0 5.0 2.0,5.0 1 6.0,12.0 7.0 2 8.0 6.0 3 10.0 15.0 4 4.0 2.0,1.0 5 6.0 4.0 6 4.0 4.0
The new DataFrame has merged together the columns with the same names and concatenated their values together with a comma.
If you would like to use a different separator, simply change the comma separator to something else in the same_merge() function.
For example, the following code shows how to use a semi-colon separator instead:
#define function to merge columns with same names together def same_merge(x): return ';'.join(x[x.notnull()].astype(str)) #define new DataFrame that merges columns with same names together df_new = df.groupby(level=0, axis=1).apply(lambda x: x.apply(same_merge, axis=1)) #view new DataFrame print(df_new) A B 0 5.0 2.0;5.0 1 6.0;12.0 7.0 2 8.0 6.0 3 10.0 15.0 4 4.0 2.0;1.0 5 6.0 4.0 6 4.0 4.0
The new DataFrame has merged together the columns with the same names and concatenated their values together with a semi-colon.
Additional Resources
The following tutorials explain how to perform other common operations in pandas:
How to Drop Duplicate Columns in Pandas
How to List All Column Names in Pandas
How to Sort Columns by Name in Pandas