You can use the following basic syntax to rename columns in a groupby() function in pandas:
df.groupby('group_col').agg(sum_col1=('col1', 'sum'), mean_col2=('col2', 'mean'), max_col3=('col3', 'max'))
This particular example calculates three aggregated columns and names them sum_col1, mean_col2, and max_col3.
The following example shows how to use this syntax in practice.
Example: Rename Columns in Groupby Function in Pandas
Suppose we have the following pandas DataFrame:
import pandas as pd #create DataFrame df = pd.DataFrame({'team': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'], 'points': [30, 22, 19, 14, 14, 11, 20, 28], 'assists': [5, 6, 6, 5, 8, 7, 7, 9], 'rebounds': [4, 13, 15, 10, 7, 7, 5, 11]}) #view DataFrame print(df) team points assists rebounds 0 A 30 5 4 1 A 22 6 13 2 A 19 6 15 3 A 14 5 10 4 B 14 8 7 5 B 11 7 7 6 B 20 7 5 7 B 28 9 11
We can use the following syntax to group the rows by the team column, then calculate three aggregated columns while providing specific names to the aggregated columns:
#calculate several aggregated columns by group and rename aggregated columns
df.groupby('team').agg(sum_points=('points', 'sum'),
mean_assists=('assists', 'mean'),
max_rebounds=('rebounds', 'max'))
sum_points mean_assists max_rebounds
team
A 85 5.50 15
B 73 7.75 11
Notice that the three aggregated columns have the custom names that we provided in the agg() function.
Also note that we could use NumPy functions to calculate the sum, mean, and max values within the agg() function if we’d like.
import numpy as np
#calculate several aggregated columns by group and rename aggregated columns
df.groupby('team').agg(sum_points=('points', np.sum),
mean_assists=('assists', np.mean),
max_rebounds=('rebounds', np.max))
sum_points mean_assists max_rebounds
team
A 85 5.50 15
B 73 7.75 11
These results match the ones from the previous example.
Additional Resources
The following tutorials explain how to perform other common operations in pandas:
How to List All Column Names in Pandas
How to Sort Columns by Name in Pandas
How to Drop Duplicate Columns in Pandas