Goals for today

One of the big advantages of the Bayesian framework is its flexibility. Last time we say how to fit models of the form

\[\begin{align*} y &\sim \sf{Norm}(\mu, \sigma) \\ \mu &= f(x; \beta) \\ \beta_i &\sim ?? \\ \sigma &\sim ?? \end{align*}\] where \(f(x; \mathbf{\beta}) = \beta_0 + \beta_1 x\).

We can generalize this in several ways:

The regression function \(f(x; \theta)\) doesn’t need to be a linear function.¹

\(f\) will still typically depend on a number of parameters (denoted collectively as \(\beta\) or \(\theta\), subscripted to get individual parameters) and our Bayesian engine will help us determine which values of these parameters are more or less plausible.
We don’t need to use a normal distribution to describe the individual level variability.

This is just another way of saying that we can use a different likelihood function.
We can choose different priors for any of the parameters involved.

We already know techniques for doing this, but we have more to learn more about how to choose the priors and what difference those choices make.
We can add additional predictors, including predictors that are not quantitative.

So far, we have seen one simple example of this – a simple quantitative predictor (weight) plus a binary predictor coded as 0 or 1 (sex).

Today’s focus is on item 1 above – working with other functions. In particular, we will be looking at polynomial regression

Polynomials

Let’s add the kids back into our Howell data.

library(rethinking)
data(Howell1)
gf_point(height ~ weight, data = Howell1, alpha = 0.7, shape = 21)

A line is not going to be a very good choice now since there is clearly a bend in the trend.

A constant standard deviation might also not be a good choice – there seems to be more variation for larger weights than for smaller weights.² But for now we are focusing on the shape.

Notes

Polynomials, especially high degree polynomials are usually not the best choice for the model function when the relationship between \(x\) and \(y\) is more complex than a simple line. But a low degree polynomial (quadratic) can sometimes work well.
If there is some underlying mechanistic model explaining the phenomenon, it is best to reverse engineer that to create the model function than to simply grab a function becuase it is easy and happens to “fit well”.
If we just need a function that is “wigglier”, there are often better options than polynomials, we’ll see one of these next time.
Another way to improve a fit is to include additional predictors or to apply transformations to the variables. We will learn how to do that soon too.

We will look at polynomial regression first because it is relatively easy and does have some important applications, but it is not the only solution and often not the best one.

The Quadratic Model

The natural way to express a quadratic model would be

\[ f(x; \beta) = \beta_0 + \beta_1 x + \beta_1 x^2 \]

But we will almost always want to standardize our variable when fitting a model like this. One reason is that when \(x\) is large, \(x^2\), \(x^3\), etc. will be enormous. Many numerical procedures work best when all the quantities in involved are on roughly the same, moderate scale. The most common form of standardization is the z-score:

\[ \mbox{z-score} = \frac{\mbox{value} - \mbox{mean value}}{\mbox{standard deviation of values}} \]

It is simplest to do this as a precalculation in the data – especially if we need to use the value in more than one place. We can make things a bit more efficient if we precompute all the squares of these standardized weights as well (else they will get recalculated for each iteration of the numerical algorithm used to estimate the posterior.)

Howell <- 
  Howell1 %>%
  mutate(
    weight_s = (weight - mean(weight)) / sd(weight),
    weight_s2 = weight_s^2
  )

m4.5 <-
  quap(
    data = Howell,
    alist(
      height ~ dnorm(mu, sigma),
      mu    <- beta_0 + beta_1 * weight_s + beta_2 * weight_s2,
      beta_0 ~ dnorm(178, 20),
      beta_1 ~ dlnorm(0, 1),
      beta_2 ~ dnorm(0, 1),
      sigma  ~ dunif(0, 50)
    )
  )

Weights <-
  tibble(
    weight_s = seq(-3, 3, by = 0.1),
    weight_s2 = weight_s^2)

Link4.5 <- 
  link(m4.5, data = Weights)
Sim4.5 <- 
  sim(m4.5, data = Weights)

Avg4.5 <- 
  apply(Link4.5, 2, mean_hdi) %>% 
  bind_rows() %>%
  bind_cols(Weights)

Ind4.5 <- 
  apply(Sim4.5, 2, mean_hdi) %>% 
  bind_rows() %>%
  bind_cols(Weights)

p4.5s <-
  gf_ribbon(ymin + ymax ~ weight_s, data = Ind4.5, fill = "blue")  %>%
  gf_ribbon(ymin + ymax ~ weight_s, data = Avg4.5, fill = "red") %>%
  gf_point(
    title = "degree: 2",
    height ~ weight_s, data = Howell, inherit = FALSE,
    shape = 21, alpha = 0.4)

p4.5s

Adding information to link() and sim() output

link() and sim() produce matrices with a row for each posterior sample and a column for each set of explanatory variables used. So when we run

Link4.5 <- 
  link(m4.5, data = Weights)

the reslution matrix has a column for each row of Weights.

dim(Link4.5)

## [1] 1000   61

dim(Weights)

## [1] 61  2

Typically we want to run some computation along each column of the matrix to obtain the posterior mean or hdi or mode, etc. After getting this into a 61-row data frame, we can add on Weights so we identify which row is which.

Avg4.5 <- 
  apply(Link4.5, 2, mean_hdi) %>%  # a list of 61 1-row data frames
  bind_rows() %>%                  # a single data frame with 61 rows
  bind_cols(Weights)               # Weights also has 61 rows, so we can add it on the right

Plotting on the natural scale

Weights <-
  Weights %>% 
  mutate(
    weight = mean(Howell$weight) + weight_s * sd(Howell$weight)
  )

Avg4.5 <- 
  apply(Link4.5, 2, mean_hdi) %>% 
  bind_rows() %>%
  bind_cols(Weights)

Ind4.5 <- 
  apply(Sim4.5, 2, mean_hdi) %>% 
  bind_rows() %>%
  bind_cols(Weights)

p4.5 <-
  gf_ribbon(ymin + ymax ~ weight, data = Ind4.5, fill = "blue")  %>%
  gf_ribbon(ymin + ymax ~ weight, data = Avg4.5, fill = "red") %>%
  gf_point(
    title = "degree: 2",
    height ~ weight, data = Howell, inherit = FALSE,
    shape = 21, alpha = 0.4)

p4.5

This is clearly not a perfect model, but it is an improvement over a line.

More polynomials

Howell <- 
  Howell %>%
  mutate(
    weight_s3 = weight_s^3,
    weight_s4 = weight_s^4
  )
Weights <-
  Weights %>% 
  mutate(
    weight_s3 = weight_s^3,
    weight_s4 = weight_s^4,
    weight = mean(Howell$weight) + weight_s * sd(Howell$weight)
  )

m4.6 <-
  quap(
    data = Howell,
    alist(
      height ~ dnorm(mu, sigma),
      mu    <- beta_0 + beta_1 * weight_s + 
                        beta_2 * weight_s2 + 
                        beta_3 * weight_s3,
      beta_0 ~ dnorm(178, 20),
      beta_1 ~ dlnorm(0, 1),
      beta_2 ~ dnorm(0, 1),
      beta_3 ~ dnorm(0, 1),
      sigma  ~ dunif(0, 50)
    )
  )


Link4.6 <- link(m4.6, data = Weights)
Sim4.6 <- sim(m4.6, data = Weights)
Avg4.6 <- 
  apply(Link4.6, 2, mean_hdi) %>% 
  bind_rows() %>%
  bind_cols(Weights)
Ind4.6 <- 
  apply(Sim4.6, 2, mean_hdi) %>% 
  bind_rows() %>%
  bind_cols(Weights)
p4.6 <-
  gf_ribbon(ymin + ymax ~ weight, data = Ind4.6, fill = "blue")  %>%
  gf_ribbon(ymin + ymax ~ weight, data = Avg4.6, fill = "red") %>%
  gf_point(
    title = "degree: 3",
    height ~ weight, data = Howell, inherit = FALSE,
    shape = 21, alpha = 0.4)

m4.6a <-
  quap(
    data = Howell,
    alist(
      height ~ dnorm(mu, sigma),
      mu    <- beta_0 + beta_1 * weight_s + 
                        beta_2 * weight_s2 + 
                        beta_3 * weight_s3 + 
                        beta_4 * weight_s4,
      beta_0 ~ dnorm(178, 20),
      beta_1 ~ dlnorm(0, 1),
      beta_2 ~ dnorm(0, 1),
      beta_3 ~ dnorm(0, 1),
      beta_4 ~ dnorm(0, 1),
      sigma  ~ dunif(0, 50)
    )
  )
Link4.6a <- link(m4.6a, data = Weights)
Sim4.6a <- sim(m4.6a, data = Weights)
Avg4.6a <- 
  apply(Link4.6a, 2, mean_hdi) %>% 
  bind_rows() %>%
  bind_cols(Weights)
Ind4.6a <- 
  apply(Sim4.6a, 2, mean_hdi) %>% 
  bind_rows() %>%
  bind_cols(Weights)
p4.6a <- 
  gf_ribbon(ymin + ymax ~ weight, data = Ind4.6a, fill = "blue")  %>%
  gf_ribbon(ymin + ymax ~ weight, data = Avg4.6a, fill = "red") %>%
  gf_point(
    title = "degree: 4",
    height ~ weight, data = Howell, inherit = FALSE,
    shape = 21, alpha = 0.4)

m4.6l <-
  quap(
    data = Howell,
    alist(
      height ~ dnorm(mu, sigma),
      mu    <- beta_0 + beta_1 * weight_s,  
      beta_0 ~ dnorm(178, 20),
      beta_1 ~ dlnorm(0, 1),
      sigma  ~ dunif(0, 50)
    )
  )
Link4.6l <- link(m4.6l, data = Weights)
Sim4.6l <- sim(m4.6l, data = Weights)
Avg4.6l <- 
  apply(Link4.6l, 2, mean_hdi) %>% 
  bind_rows() %>%
  bind_cols(Weights)
Ind4.6l <- 
  apply(Sim4.6l, 2, mean_hdi) %>% 
  bind_rows() %>%
  bind_cols(Weights)
p4.6l <-
  gf_ribbon(ymin + ymax ~ weight, data = Ind4.6l, fill = "blue")  %>%
  gf_ribbon(ymin + ymax ~ weight, data = Avg4.6l, fill = "red") %>%
  gf_point(
    title = "degree: 1",
    height ~ weight, data = Howell, inherit = FALSE,
    shape = 21, alpha = 0.4)

( p4.6l | p4.5 ) / (p4.6 | p4.6a)

Inspecting posterior distributions of parameters

precis(m4.6l) %>% pander()

	mean	sd	5.5%	94.5%
beta_0	138.3	0.4006	137.6	138.9
beta_1	25.94	0.4012	25.3	26.58
sigma	9.346	0.2833	8.893	9.799

precis(m4.5) %>% pander()

	mean	sd	5.5%	94.5%
beta_0	146.1	0.369	145.5	146.6
beta_1	21.73	0.2889	21.27	22.19
beta_2	-7.803	0.2742	-8.242	-7.365
sigma	5.775	0.1765	5.493	6.057

precis(m4.6) %>% pander()

	mean	sd	5.5%	94.5%
beta_0	146.4	0.31	145.9	146.9
beta_1	15.22	0.4763	14.46	15.98
beta_2	-6.205	0.2572	-6.616	-5.794
beta_3	3.584	0.2288	3.218	3.949
sigma	4.83	0.147	4.595	5.065

precis(m4.6a) %>% pander()

	mean	sd	5.5%	94.5%
beta_0	145.5	0.3541	145	146.1
beta_1	16.18	0.5143	15.36	17
beta_2	-3.758	0.551	-4.639	-2.878
beta_3	2.776	0.28	2.329	3.224
beta_4	-0.9843	0.1958	-1.297	-0.6713
sigma	4.84	0.148	4.603	5.077

Post4.6l <- m4.6l %>% extract.samples(1e4)
Post4.5 <- m4.5 %>% extract.samples(1e4)
Post4.6 <- m4.6 %>% extract.samples(1e4)
Post4.6a <- m4.6a %>% extract.samples(1e4)

library(GGally)

## Registered S3 method overwritten by 'GGally':
##   method from   
##   +.gg   ggplot2

## 
## Attaching package: 'GGally'

## The following object is masked from 'package:pander':
## 
##     wrap

ggpairs(Post4.6l, lower = list(continuous = "density"))

ggpairs(Post4.5, lower = list(continuous = "density"))

ggpairs(Post4.6, lower = list(continuous = "density"))

ggpairs(Post4.6a, lower = list(continuous = "density"))

Notes

Note: the word “linear” gets used in more than one way in regression. \(f(x) = \beta_0 \cdot \beta_1 x\) is linear in \(x\) and that’s probably what you think of when you think of linear. But \(g(x) = \beta_0 + \beta_1 x + \beta2 \log(x)\) is linear in the \(\beta\)’s, and that is usually more important. So a model with regression function \(g\) is also considered a linear model. This can be seen another way using linear algera: \(g(X) = \beta^{\top} \cdot X\) where \(X\) is a 3-column matrix with a column of 1’s a column of values of \(x\) and column of values of \(\log(x)\) and \(\beta\) is a 1-column matrix containing \(\beta_0\), \(\beta_1\), and \(\beta_2\).↩︎
Be careful here. Two things can deceive your eye. (1) Where there is more data, there are more chances to get values farther from average. This is not additional variability. (2) Variability always looks more pronounced where the trend is flatter and less pronounced where the trend is steeper. In this case, we have more data where the trend is flatter, so both factors are going to make the differences in variability look larger than they are.↩︎

Polynomial Regression?

Stat 341

Goals for today

Polynomials

Notes

The Quadratic Model

Adding information to link() and sim() output

Plotting on the natural scale

More polynomials

Inspecting posterior distributions of parameters

Notes