Home » How to Calculate SST, SSR, and SSE in Python

How to Calculate SST, SSR, and SSE in Python

by Erma Khan

We often use three different sum of squares values to measure how well a regression line fits a dataset:

1. Sum of Squares Total (SST) – The sum of squared differences between individual data points (yi) and the mean of the response variable (y).

  • SST = Σ(yiy)2

2. Sum of Squares Regression (SSR) – The sum of squared differences between predicted data points (ŷi) and the mean of the response variable(y).

  • SSR = Σ(ŷiy)2

3. Sum of Squares Error (SSE) – The sum of squared differences between predicted data points (ŷi) and observed data points (yi).

  • SSE = Σ(ŷi – yi)2

The following step-by-step example shows how to calculate each of these metrics for a given regression model in Python.

Step 1: Create the Data

First, let’s create a dataset that contains the number of hours studied and exam score received for 20 different students at a certain university:

import pandas as pd

#create pandas DataFrame
df = pd.DataFrame({'hours': [1, 1, 1, 2, 2, 2, 2, 2, 3, 3,
                             3, 4, 4, 4, 5, 5, 6, 7, 7, 8],
                   'score': [68, 76, 74, 80, 76, 78, 81, 84, 86, 83,
                             88, 85, 89, 94, 93, 94, 96, 89, 92, 97]})

#view first five rows of DataFrame
df.head()

	hours	score
0	1	68
1	1	76
2	1	74
3	2	80
4	2	76

Step 2: Fit a Regression Model

Next, we’ll use the OLS() function from the statsmodels library to fit a simple linear regression model using score as the response variable and hours as the predictor variable:

import statsmodels.api as sm

#define response variable
y = df['score']

#define predictor variable
x = df[['hours']]

#add constant to predictor variables
x = sm.add_constant(x)

#fit linear regression model
model = sm.OLS(y, x).fit()

Step 3: Calculate SST, SSR, and SSE

Lastly, we can use the following formulas to calculate the SST, SSR, and SSE values of the model:

import numpy as np

#calculate sse
sse = np.sum((model.fittedvalues - df.score)**2)
print(sse)

331.07488479262696

#calculate ssr
ssr = np.sum((model.fittedvalues - df.score.mean())**2)
print(ssr)

917.4751152073725

#calculate sst
sst = ssr + sse
print(sst)

1248.5499999999995

The metrics turn out to be:

  • Sum of Squares Total (SST): 1248.55
  • Sum of Squares Regression (SSR): 917.4751
  • Sum of Squares Error (SSE): 331.0749

We can verify that SST = SSR + SSE:

  • SST = SSR + SSE
  • 1248.55 = 917.4751 + 331.0749

Additional Resources

You can use the following calculators to automatically calculate SST, SSR, and SSE for any simple linear regression line:

The following tutorials explain how to calculate SST, SSR, and SSE in other statistical software:

Related Posts