You can use the following basic syntax to use GroupBy on a pandas DataFrame with a multiindex:
#calculate sum by level 0 and 1 of multiindex df.groupby(level=[0,1]).sum() #calculate count by level 0 and 1 of multiindex df.groupby(level=[0,1]).count() #calculate max value by level 0 and 1 of multiindex df.groupby(level=[0,1]).max() ...
Each of these examples calculate some metric grouped by two levels of a multiindex pandas DataFrame.
The following example shows how to use this syntax in practice.
Example: Use GroupBy on MultiIndex in pandas
Suppose we have the following pandas DataFrame with a multiindex:
import pandas as pd
#create DataFrame
df = pd.DataFrame({'team': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'],
'position': ['G', 'G', 'F', 'F', 'G', 'G', 'F', 'F'],
'points': [6, 8, 9, 11, 13, 8, 8, 15]})
#define multiindex
df.set_index(['team', 'position'], inplace=True)
#view DataFrame
print(df)
points
team position
A G 6
G 8
F 9
F 11
B G 13
G 8
F 8
F 15
We can use the following syntax to calculate the sum of the points values grouped by both levels of the multiindex:
#calculate sum of points grouped by both levels of the multiindex:
df.groupby(level=[0,1]).sum()
points
team position
A F 20
G 14
B F 23
G 21
We can use similar syntax to calculate the max of the points values grouped by both levels of the multiindex:
#calculate max of points grouped by both levels of the multiindex:
df.groupby(level=[0,1]).max()
points
team position
A F 11
G 8
B F 15
G 13
We can use similar syntax to calculate any value we’d like grouped by several levels of a multiindex.
Note: You can find the complete documentation for the GroupBy operation in pandas here.
Additional Resources
The following tutorials explain how to perform other common operations in pandas:
Pandas: How to Calculate Cumulative Sum by Group
Pandas: How to Count Unique Values by Group
Pandas: How to Calculate Correlation By Group