You can use the following syntax to calculate the bin counts of one variable grouped by another variable in pandas:
#define bins groups = df.groupby(['group_var', pd.cut(df.value_var, bins)]) #display bin count by group variable groups.size().unstack()
The following example shows how to use this syntax in practice.
Example: Use GroupBy with Bin Counts in Pandas
Suppose we have the following pandas DataFrame that shows the points scored by basketball players on various teams:
import pandas as pd
#create DataFrame
df = pd.DataFrame({'team': ['A', 'A', 'A', 'A', 'A', 'A', 'A', 'A',
'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B'],
'points': [4, 7, 7, 11, 12, 15, 19, 19,
5, 5, 11, 12, 14, 14, 15, 15]})
#view DataFrame
print(df)
team points
0 A 4
1 A 7
2 A 7
3 A 11
4 A 12
5 A 15
6 A 19
7 A 19
8 B 5
9 B 5
10 B 11
11 B 12
12 B 14
13 B 14
14 B 15
15 B 15
We can use the following syntax to calculate the frequency of points for each team, grouped into specific bins:
#define groups
groups = df.groupby(['team', pd.cut(df.points, [0, 10, 15, 20])])
#display bin count grouped by team
groups.size().unstack()
points (0, 10] (10, 15] (15, 20]
team
A 3 3 2
B 2 6 0
Here’s how to interpret the output:
- A total of 3 players on team A scored between 0 and 10 points.
- A total of 3 players on team A scored between 10 and 15 points.
- A total of 2 players on team A scored between 15 and 20 points.
And so on.
Note that we can specify whichever bins we’d like within the pd.cut() function.
For example, we could define just two bins:
#define groups
groups = df.groupby(['team', pd.cut(df.points, [0, 10, 20])])
#display bin count grouped by team
groups.size().unstack()
points (0, 10] (10, 20]
team
A 3 5
B 2 6
Here’s how to interpret the output:
- A total of 3 players on team A scored between 0 and 10 points.
- A total of 5 players on team A scored between 10 and 20 points.
- A total of 2 players on team B scored between 0 and 10 points.
- A total of 6 players on team B scored between 10 and 20 points.
Note 1: You can find the complete documentation for the GroupBy function here.
Note 2: You can find the complete documentation for the cut function here.
Additional Resources
The following tutorials explain how to perform other common operations in pandas:
Pandas: How to Calculate Cumulative Sum by Group
Pandas: How to Count Unique Values by Group
Pandas: How to Calculate Mode by Group
Pandas: How to Calculate Correlation By Group