You can use the following methods to plot histograms by group in a pandas DataFrame:
Method 1: Plot Histograms by Group Using Multiple Plots
df['values_var'].hist(by=df['group_var'])
Method 2: Plot Histograms by Group Using One Plot
plt.hist(group1, alpha=0.5, label='group1') plt.hist(group2, alpha=0.5, label='group2') plt.hist(group3, alpha=0.5, label='group3')
The following examples show how to use each method in practice with the following pandas DataFrame that shows the points scored by basketball players on three different teams:
import pandas as pd import numpy as np #make this example reproducible np.random.seed(1) #create DataFrame df = pd.DataFrame({'team': np.repeat(['A', 'B', 'C'], 100), 'points': np.random.normal(loc=20, scale=2, size=300)}) #view head of DataFrame print(df.head()) team points 0 A 23.248691 1 A 18.776487 2 A 18.943656 3 A 17.854063 4 A 21.730815
Example 1: Plot Histograms by Group Using Multiple Plots
The following code shows how to create three histograms that display the distribution of points scored by players on each of the three teams:
#create histograms of points by team df['points'].hist(by=df['team'])
We can also use the edgecolor argument to add edge lines to each histogram and the figsize argument to increase the size of each histogram to make them easier to view:
#create histograms of points by team df['points'].hist(by=df['team'], edgecolor='black', figsize = (8,6))
Example 2: Plot Histograms by Group Using One Plot
The following code shows how to create three histograms and place them all on the same plot:
import matplotlib.pyplot as plt
#define points values by group
A = df.loc[df['team'] == 'A', 'points']
B = df.loc[df['team'] == 'B', 'points']
C = df.loc[df['team'] == 'C', 'points']
#add three histograms to one plot
plt.hist(A, alpha=0.5, label='A')
plt.hist(B, alpha=0.5, label='B')
plt.hist(C, alpha=0.5, label='C')
#add plot title and axis labels
plt.title('Points Distribution by Team')
plt.xlabel('Points')
plt.ylabel('Frequency')
#add legend
plt.legend(title='Team')
#display plot
plt.show()
The end result is one plot that displays three overlaid histograms.
Note: The alpha argument specifies the transparency of each histogram. This value can range from 0 to 1. By setting this value equal to 0.5, we’re able to better view each overlaid histogram.
Additional Resources
The following tutorials explain how to create other common plots in Python:
How to Plot Multiple Lines in Matplotlib
How to Create Boxplot from Pandas DataFrame
How to Plot Multiple Pandas Columns on Bar Chart