The Method of Least Squares

When we fit a regression line to set of points, we assume that there is some unknown linear relationship between Y and X, and that for every one-unit increase in X, Y increases by some set amount on average. Our fitted regression line enables us to predict the response, Y, for a given value of X.

$$ \mu_{Y|X}=\beta_0+\beta_1X_1 $$

But for any specific observation, the actual value of Y can deviate from the predicted value. The deviations between the actual and predicted values are called errors, or residuals.

The better the line fits the data, the smaller the residuals (on average). How do we find the line that best fits the data? In other words, how do we determine values of the intercept and slope for our regression line? Intuitively, if we were to manually fit a line to our data, we would try to find a line that minimizes the model errors, overall. But, when we fit a line through data, some of the errors will be positive and some will be negative. In other words, some of the actual values will be larger than their predicted value (they will fall above the line), and some of the actual values will be less than their predicted values (they'll fall below the line). 

If we add up all of the errors, the sum will be zero. So how do we measure overall error? We use a little trick: we square the errors and find a line that minimizes this sum of the squared errors.

$$ \sum{e_t}^2=\sum(Y_i-\overline{Y}_i)^2 $$

This method, the method of least squares, finds values of the intercept and slope coefficient that minimize the sum of the squared errors.

To illustrate the concept of least squares, we use the Demonstrate Regression teaching module.

Visualizing the method of least squares

Let’s look at the method of least squares from another perspective. Imagine that you’ve plotted some data using a scatterplot, and that you fit a line for the mean of Y through the data. Let’s lock this line in place, and attach springs between the data points and the line.

Some of the data points are further from the mean line, so these springs are stretched more than others. The springs that are stretched the furthest exert the greatest force on the line.

What if we unlock this mean line, and let it rotate freely around the mean of Y? The forces on the springs balance, rotating the line. The line rotates until the overall force on the line is minimized.

The are some cool physics at play, involving the relationship between force and the energy needed to pull a spring a given distance. It turns out that minimizing the overall energy in the springs is equivalent to fitting a regression line using the method of least squares.