Often you may want to extract a summary of a regression model created using scikit-learn in Python.
Unfortunately, scikit-learn doesn’t offer many built-in functions to analyze the summary of a regression model since it’s typically only used for predictive purposes.
So, if you’re interested in getting a summary of a regression model in Python, you have two options:
1. Use limited functions from scikit-learn.
2. Use statsmodels instead.
The following examples show how to use each method in practice with the following pandas DataFrame:
import pandas as pd #create DataFrame df = pd.DataFrame({'x1': [1, 2, 2, 4, 2, 1, 5, 4, 2, 4, 4], 'x2': [1, 3, 3, 5, 2, 2, 1, 1, 0, 3, 4], 'y': [76, 78, 85, 88, 72, 69, 94, 94, 88, 92, 90]}) #view first five rows of DataFrame df.head() x1 x2 y 0 1 1 76 1 2 3 78 2 2 3 85 3 4 5 88 4 2 2 72
Method 1: Get Regression Model Summary from Scikit-Learn
We can use the following code to fit a multiple linear regression model using scikit-learn:
from sklearn.linear_model import LinearRegression
#initiate linear regression model
model = LinearRegression()
#define predictor and response variables
X, y = df[['x1', 'x2']], df.y
#fit regression model
model.fit(X, y)
We can then use the following code to extract the regression coefficients of the model along with the R-squared value of the model:
#display regression coefficients and R-squared value of model
print(model.intercept_, model.coef_, model.score(X, y))
70.4828205704 [ 5.7945 -1.1576] 0.766742556527
Using this output, we can write the equation for the fitted regression model:
y = 70.48 + 5.79x1 – 1.16x2
We can also see that the R2 value of the model is 76.67.
This means that 76.67% of the variation in the response variable can be explained by the two predictor variables in the model.
Although this output is useful, we still don’t know the overall F-statistic of the model, the p-values of the individual regression coefficients, and other useful metrics that can help us understand how well the model fits the dataset.
Method 2: Get Regression Model Summary from Statsmodels
If you’re interested in extracting a summary of a regression model in Python, you’re better off using the statsmodels package.
The following code shows how to use this package to fit the same multiple linear regression model as the previous example and extract the model summary:
import statsmodels.api as sm
#define response variable
y = df['y']
#define predictor variables
x = df[['x1', 'x2']]
#add constant to predictor variables
x = sm.add_constant(x)
#fit linear regression model
model = sm.OLS(y, x).fit()
#view model summary
print(model.summary())
OLS Regression Results
==============================================================================
Dep. Variable: y R-squared: 0.767
Model: OLS Adj. R-squared: 0.708
Method: Least Squares F-statistic: 13.15
Date: Fri, 01 Apr 2022 Prob (F-statistic): 0.00296
Time: 11:10:16 Log-Likelihood: -31.191
No. Observations: 11 AIC: 68.38
Df Residuals: 8 BIC: 69.57
Df Model: 2
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 70.4828 3.749 18.803 0.000 61.839 79.127
x1 5.7945 1.132 5.120 0.001 3.185 8.404
x2 -1.1576 1.065 -1.087 0.309 -3.613 1.298
==============================================================================
Omnibus: 0.198 Durbin-Watson: 1.240
Prob(Omnibus): 0.906 Jarque-Bera (JB): 0.296
Skew: -0.242 Prob(JB): 0.862
Kurtosis: 2.359 Cond. No. 10.7
==============================================================================
Notice that the regression coefficients and the R-squared value match those calculated by scikit-learn, but we’re also provided with a ton of other useful metrics for the regression model.
For example, we can see the p-values for each individual predictor variable:
- p-value for x1 = .001
- p-value for x2 = 0.309
We can also see the overall F-statistic of the model, the adjusted R-squared value, the AIC value of the model, and much more.
Additional Resources
The following tutorials explain how to perform other common operations in Python:
How to Perform Simple Linear Regression in Python
How to Perform Multiple Linear Regression in Python
How to Calculate AIC of Regression Models in Python