Linear Models

What are linear models?

Linear models are a common method of modeling and analyzing data. They can be used to predict one or more response variables ($y$) from one or more factors or predictor variables ($x$). They can also be used to understand how changing the factors explains variability in the responses. An example of the general form of a linear model is

$y = \beta_{0} + \beta_{1} x_{1} + \beta_{2} x_{2} + \beta_{12} x_{1} x_{2} + \beta_{11} x_{1}^{2} + \beta_{22} x_{2}^{2} + \varepsilon$,

where the $\beta$s represent the parameters of the model. That is, the response is predicted by a linear combination of terms involving the predictor variables. The constant ($\beta_{0}$) and the terms involving the predictor variables ($\beta_{i} x_{i}$) represent the signal, or systematic variation. The $\varepsilon$ term represents the noise, or random error.

Why are they called linear models?

The word "linear" in the name of these models refers to the $\beta$s. That is, the linear model is linear in the parameters. It does not have to be linear in the predictor variables, though. For example, the predictors can be combined (e.g., $x_{1} x_{2}$) or transformed (e.g., $x_{1}$$^{2}$) as you see in the example above. You could have a term involving $log$($x_{i}$), and it would still be a linear model.

What are some common linear models?

Linear models are classified according to the type of data (categorical or continuous) for the response variable and the predictor variables. You might have heard of the general linear model and the generalized linear model. Don’t get confused by terminology! They are two different models. The general linear model is for continuous responses when the error term is normally distributed. The generalized linear model allows for non-normal distributions of the error term, which includes models for categorical responses.

Continuous response(s):

Number of predictors Type of predictor
Categorical Continuous Categorical and continuous
One One-way ANOVA Simple linear regression
Two Two-way ANOVA Multiple linear regression Analysis of covariance
Many N-way ANOVA Multiple linear regression General linear model

Categorical response(s):

Type of predictor
Categorical Continuous Categorical and continuous
Contingency table analysis Logistic regression Generalized linear model