Florida Lakes again

library(Lock5withR)
gf_point(AvgMercury ~ Alkalinity, data = FloridaLakes)

Fit the model

model1 <- lm(AvgMercury ~ Alkalinity, data = FloridaLakes)

Residual Plots check several things

Let us inspect for linearity, independence, and equal standard deviation.

Can do

  • residuals vs x,
  • residuals vs fits,
  • residuals vs order.

The most commonly used is residuals vs fits because it generalizes to models with multiple predictors. Whether order is meaningful depends on how the data were collected and stored.

gf_point(AvgMercury ~ Alkalinity, data = FloridaLakes) %>% gf_lm()

gf_point(resid(model1) ~ Alkalinity, data = FloridaLakes)

gf_point(resid(model1) ~ fitted(model1), data = FloridaLakes)

mplot(model1, which = 1)
## `geom_smooth()` using formula 'y ~ x'

Residual QQ plots can check for normality

We can check normality using a normal-quatile plot

gf_qq( ~ resid(model1)) %>% gf_qqline()

mplot(model1, which = 2)

Can we do better?

The first thing to try is a transformation of one or both variables. Comonly useful transformations include:

  • logarithm – especially when “doubling” is meaningful
  • square root – not as strong as logarithm
  • reciprocal – especially when a “rate” is involved
model2 <- lm(AvgMercury ~ log(Alkalinity), data = FloridaLakes)
model3 <- lm(log(AvgMercury) ~ Alkalinity, data = FloridaLakes)
model4 <- lm(log(AvgMercury) ~ log(Alkalinity), data = FloridaLakes)

Model 2

gf_point(AvgMercury ~ log(Alkalinity), data = FloridaLakes) %>% gf_lm()

mplot(model2, which = 1:2)
## [[1]]
## `geom_smooth()` using formula 'y ~ x'

## 
## [[2]]

Model 3

gf_point(log(AvgMercury) ~ Alkalinity, data = FloridaLakes) %>% gf_lm()

mplot(model3, which = 1:2)
## [[1]]
## `geom_smooth()` using formula 'y ~ x'

## 
## [[2]]

Model 4

gf_point(log(AvgMercury) ~ log(Alkalinity), data = FloridaLakes) %>% gf_lm()

mplot(model4, which = 1:2)
## [[1]]
## `geom_smooth()` using formula 'y ~ x'

## 
## [[2]]