You can use the following syntax to calculate a cumulative sum by group in pandas:
df['cumsum_col'] = df.groupby(['col1'])['col2'].cumsum()
This particular formula calculates the cumulative sum of col2, grouped by col1, and displays the results in a new column titled cumsum_col.
The following example shows how to use this syntax in practice.
Example: Calculate Cumulative Sum by Group in Pandas
Suppose we have the following pandas DataFrame that contains information about sales for various stores:
import pandas as pd #create DataFrame df = pd.DataFrame({'store': ['A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'B'], 'sales': [4, 7, 10, 5, 8, 9, 12, 15, 10, 8]}) #view DataFrame print(df) store sales 0 A 4 1 A 7 2 A 10 3 A 5 4 A 8 5 B 9 6 B 12 7 B 15 8 B 10 9 B 8
We can use the following syntax to calculate the cumulative sum of sales for each store:
#add column that shows cumulative sum of sales by store
df['cumsum_sales'] = df.groupby(['store'])['sales'].cumsum()
#view updated DataFrame
print(df)
store sales cumsum_sales
0 A 4 4
1 A 7 11
2 A 10 21
3 A 5 26
4 A 8 34
5 B 9 9
6 B 12 21
7 B 15 36
8 B 10 46
9 B 8 54
The cumsum_sales column shows the cumulative sales, grouped by each store.
Note: You can find the complete documentation for the cumsum function in pandas here.
Additional Resources
The following tutorials explain how to perform other common tasks in pandas:
How to Sum Specific Columns in Pandas
How to Sum Columns Based on a Condition in Pandas
How to Calculate a Reversed Cumulative Sum in Pandas