The method of least squares is a method we can use to find the regression line that best fits a given dataset.
We can use the linalg.lstsq() function in NumPy to perform least squares fitting.
The following step-by-step example shows how to use this function in practice.
Step 1: Enter the Values for X and Y
First, let’s create the following NumPy arrays:
import numpy as np #define x and y arrays x = np.array([6, 7, 7, 8, 12, 14, 15, 16, 16, 19]) y = np.array([14, 15, 15, 17, 18, 18, 19, 24, 25, 29])
Step 2: Perform Least Squares Fitting
We can use the following code to perform least squares fitting and find the line that best “fits” the data:
#perform least squares fitting np.linalg.lstsq(np.vstack([x, np.ones(len(x))]).T, y, rcond=None)[0] array([0.96938776, 7.76734694])
The result is an array that contains the slope and intercept values for the line of best fit.
From the output we can see:
- Slope: 0.969
- Intercept: 7.767
Using these two values, we can write the equation for the line of best fit:
ŷ = 7.767 + 0.969x
Step 3: Interpret the Results
Here’s how to interpret the line of best fit:
- When x is equal to 0, the average value for y is 7.767.
- For each one unit increase in x, y increases by an average of .969.
We can also use the line of best fit to predict the value of y based on the value of x.
For example, if x has a value of 10 then we predict that the value of y would be 17.457:
- ŷ = 7.767 + 0.969x
- ŷ = 7.767 + 0.969(10)
- ŷ = 17.457
Bonus: Video Explanation of Least Squares Fitting
Refer to the video below for a simple explanation of least squares fitting:
Additional Resources
The following tutorials explain how to perform other common tasks in NumPy:
How to Remove Specific Elements from NumPy Array
How to Get the Index of Max Value in NumPy Array
How to Fill NumPy Array with Values