Regression with Linear Algebra
Linear algebra is a powerful tool in any mathematician’s arsenal; today, we explore how matrices can be used to plot lines and curves of best fit for a given set of data points.
Motivation
Regression is intimately tied to the idea of projection matrices ― a projection matrix P maps a vector b onto the space spanned by the columns of A.
For a given set of data points, there may not be a straight line that passes through each point ― it is far more likely that we can only produce a ‘line of best fit’, which minimises the distance between the line and each point of our dataset.
In matrix form, this means that Ax = b is not solvable, as there is no straight line that can pass through every point and satisfy each equation. Instead, we will project the vector b onto the column space C(A) of A, forming a vector p, and solve for coefficients of our line of best fit which will be able to produce the vector p.
The projection is orthogonal, meaning that b is projected to the point of C(A) which is closest to it. This allows us to assert that the line produced by the coefficients found through this method is indeed the ‘best fit’; the orthogonal projection of b onto C(A), which we name p, is the closest that Ax can get to the original vector b.
Proving that a given line is the ‘best’ line for a dataset, or alternatively finding the coefficients of the line of best fit for a set of points, is the task of regression. We find these coefficients by minimising the error, defined as the sum of the squares of the vertical distances between each point and the line ― producing the lowest value of the sum of the squared distances leads to the name for this method, least squares.
Example: Linear Regression with Least Squares
We will now produce a line of best fit of the form y = Ax + B for the given points (1, 1), (2, 1), and (3, 2).
If the line went through all three points, we would have three equations which we could solve simultaneously to find M and C.
This is a system of linear equations, which we can now write in the language of linear algebra using matrices and vectors.
We describe this system as Ax = b; a matrix of coefficients A is applied to a vector of unknowns x, producing a solution vector b. Unfortunately, the matrix A is rectangular and thus singular ― we cannot solve this system directly.
Instead, we solve the related equation:
AᵀA is a square matrix and invertible as long as A has independent columns; this is true for our matrix A, so we can now multiply both sides of the second equation by the inverse of AᵀA to solve for the coefficients. In fact, this second equation relates to projecting b onto the column space of A; this is what ensures that the coefficients we find will indeed produce the line of best fit.
Note the caret (hat) over x in the image ― this is to emphasise that while this is our best possible solution, it is still not a perfect match for our original matrix equation. This corresponds to our geometric intuition of a straight line not always being able to pass through every point contained within a dataset.
This gives us A = 1/2 and B = 1/3, which we now substitute into our original line of best fit equation, y = Ax + B.
The error for each point is extremely small, and it is reasonable to conclude from our work and prior assertions that the sum of our squared errors has indeed been minimised. Our least squares linear regression is complete.
Is non-linear regression possible?
Yes! In our example we chose to fit points to the function y = Ax + B, but we equally could have chose y = Ax² + Bx + C, or nearly any other function for that matter. Substituting (x, y) pairs into our desired function almost always still gives systems of linear equations, and thus we can utilise the same least squares method to solve for coefficients.
Further Exploration
Lecture 16 of 18.06Sc Linear Algebra ― Projection Matrices and Least Squares
US Particle Accelerator School ― Least Squares Fitting