Linear Model Function

Students cheer on the Redhawks during a sporting event at Miami University.

The lm() function (short for "Linear Modeling") is a function in base R that can be used to, as the name suggests, create a linear model; this model can include multiple variables, including interaction terms and squared terms. A brief discussion of its use is provided below.

Syntax

Much like we use '==' instead of just '=' when using an 'if' statement, we do not use the lone equals sign when writing equations in R; instead we use a '~', located on the top left of the keyboard. Otherwise, our equations still follow the same format as when we write them by hand.

Create a Linear Model

The lm() function requires two arguments, as shown below:

lm(data, formula)
  • data: the name of the dataset from which you are building the model
  • formula: the equation you are using to create your model

For this discussion, we will be using the mtcars dataset, included in base R, to demonstrate.
Let's begin by modeling a car's miles per gallon (mpg) as a function of its weight (wt):

library(ggplot2)
myModel = lm(data = mtcars, mpg ~ wt)
myModel
ggplot(data = mtcars, aes(x = wt, y = mpg, col = 'red')) +
geom_point() +
geom_abline(aes(intercept = 37.285, slope = -5.344))

As you can see, creating a model with R is very simple; also note that we do not need to use a $ or enter the variable names as strings within the lm() function.

the fmodel() function from the  'statisticalModeling' package is useful for plotting the line of the equation, and requires only the name of the model ( myModel in this case), though it does not include the dataset in this plot; we use ggplot here to make the points easily visible for this demonstration.

Multiple terms

While the mtcars data serves as a simple example, the data you'll encounter in the workplace is farm more complex; using a single variable to create a model just won't work. Luckily, adding variables to our model is quite simple: we simply use the '+' symbol, followed by our new term, as demonstrated below:

myModel = lm(data = mtcars, mpg ~ wt + hp)
myModel

Higher Order Terms

However, sometimes a relationship between variables isn't purely linear: our relationship may have a quadratic term, or perhaps one variable influences another, and we need an interaction term in the data. While we could code these into our dataset using R and dplyr, we do not need to do that here.

To create an interaction term, we can use one of two methods: if we wish to include both terms and their interaction in the model, we can simply use the asterisk:

myModel = lm(data = mtcars, mpg ~ wt * hp)
myModel 

However, if we wish to include only one (or neither) of these terms, we use a colon for the interaction term instead

myModel = lm(data = mtcars, mpg ~ wt : hp)
myModel