You can use the following basic syntax to calculate the cumulative percentage of values in a column of a pandas DataFrame:
#calculate cumulative sum of column df['cum_sum'] = df['col1'].cumsum() #calculate cumulative percentage of column (rounded to 2 decimal places) df['cum_percent'] = round(100*df.cum_sum/df['col1'].sum(),2)
The following example shows how to use this syntax in practice.
Example: Calculate Cumulative Percentage in Pandas
Suppose we have the following pandas DataFrame that shows the number of units a company sells during consecutive years:
import pandas as pd #create DataFrame df = pd.DataFrame({'year': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], 'units_sold': [60, 75, 77, 87, 104, 134, 120, 125, 140, 150]}) #view DataFrame print(df) year units_sold 0 1 60 1 2 75 2 3 77 3 4 87 4 5 104 5 6 134 6 7 120 7 8 125 8 9 140 9 10 150
Next, we can use the following code to add a column that shows the cumulative number of units sold and cumulative percentage of units sold:
#calculate cumulative sum of units sold
df['cum_sum'] = df['units_sold'].cumsum()
#calculate cumulative percentage of units sold
df['cum_percent'] = round(100*df.cum_sum/df['units_sold'].sum(),2)
#view updated DataFrame
print(df)
year units_sold cum_sum cum_percent
0 1 60 60 5.60
1 2 75 135 12.59
2 3 77 212 19.78
3 4 87 299 27.89
4 5 104 403 37.59
5 6 134 537 50.09
6 7 120 657 61.29
7 8 125 782 72.95
8 9 140 922 86.01
9 10 150 1072 100.00
We interpret the cumulative percentages as follows:
- 5.60% of all sales were made in year 1.
- 12.59 of all sales were made in years 1 and 2 combined.
- 19.78% of all sales were made in years 1, 2, and 3 combined.
And so on.
Note that you can simply change the value in the round() function to change the number of decimal points shown as well.
For example, we could round the cumulative percentage to zero decimal places instead:
#calculate cumulative sum of units sold
df['cum_sum'] = df['units_sold'].cumsum()
#calculate cumulative percentage of units sold
df['cum_percent'] = round(100*df.cum_sum/df['units_sold'].sum(),0)
#view updated DataFrame
print(df)
year units_sold cum_sum cum_percent
0 1 60 60 6.0
1 2 75 135 13.0
2 3 77 212 20.0
3 4 87 299 28.0
4 5 104 403 38.0
5 6 134 537 50.0
6 7 120 657 61.0
7 8 125 782 73.0
8 9 140 922 86.0
9 10 150 1072 100.0
The cumulative percentages are now rounded to zero decimal places.
Additional Resources
The following tutorials explain how to perform other common operations in Python:
How to Create Frequency Tables in Python
How to Calculate Relative Frequency in Python