Goals for today

Express models in mathematical notation
Convert mathematical notation into quap() code
Think a bit about priors
Investigate models using prior sampling & posterior sampling
- dpqr-functions

Side Note: A bad idea

McElreath says

I use the name d over and over again in this book to refer to the data frame we aer working with at the moment. I keep its name short to save you typing.

This is my least favorite feature of this text. It is very error prone (easy to grab the wrong data) and confusing (which data is d just now?). The data being used are a really important part of your model – you always want to know what the data are. So… I will not follow this practice and I encourage you to avoid it as well.

I will usually capitalize the names of data frames and use lower case for the variables inside them. This helps to recognize which is which. But it is an uncommon convention in the R community. (The use of capitalization is the R world is mostly haphazard with some preference for lower case. You will see CamelCase, camelCase, snake_case, dot.case, and sometimes all mixed together in the same code. All I can say about this is “I’m sorry”.)

dpqr-functions

In R, there are typically 4 functions associated with each distribution. Suppose \(X \sim {\sf Dist}(...)\), where \({\sf Dist}\) is a place holder for the distribution family and \(...\) are the parameters of the distribution.

function	key word	what it computes
`ddist(x, ...)`	density	the pmf or pdf function for \(X\)
`pdist(q, ...)`	probability	\(P(X \le q)\)
`qdist(p, ...)`	quantile	smallest \(q\) such that \(P(X \le q) \ge p\).
`rdist(n, ...)`	random	generate `n` random draws from the distribution

In addition, the gf_dist() function can be used to plot a distribution.

Many examples of dist: norm, beta, unif, exp, gamma, binom, lnorm, triangle, t, etc.

Notes

If you put an x in front of some (the most commmon ones) of these you get something extra – a plot.
```
xpnorm(120, mean = 100, sd = 15)  
```
```
xpbeta(0.8, shape1 = 8, shape2 = 2)
```
The p-function and q-function are inverses. The p-function takes a value as input and returns a probability. The q-function takes a probability as input and returns a value.
```
# using defualt values of mean = 0, sd = 1
qnorm(pnorm(2))
```
```
## [1] 2
```
```
pnorm(qnorm(0.2))
```
```
## [1] 0.2
```

These functions are vectorized. Vectorization happens over all arguments simultaneously. (This is why we needed functions like map_dbl() and map2_dbl() in our grid method examples.)

qnorm(0.025, mean = c(10, 20, 30), sd = c(1, 2, 3))

## [1]  8.040036 16.080072 24.120108

pnorm(c(12, 22, 21), mean = c(10, 20, 30), sd = c(1, 2, 3))

## [1] 0.977249868 0.841344746 0.001349898

runif(3, min = 1:3, max = 2:4)

## [1] 1.720904 2.875773 3.760982

Some Example Models

Howell data set

Unlike most packages in R, the rethinking packages does not use lazy data. This means you must explicitly load the data sets each time you use them. Here’s how:

library(rethinking)     # load the package (already done in our template Rmd)
data(Howell1)           # load a data set
precis(Howell1)         # precis is fancy word for summary

##               mean         sd      5.5%     94.5%     histogram
## height 138.2635963 27.6024476 81.108550 165.73500 ▁▁▁▁▁▁▁▂▁▇▇▅▁
## weight  35.6106176 14.7191782  9.360721  54.50289 ▁▂▃▂▂▂▂▅▇▇▃▂▁
## age     29.3443934 20.7468882  1.000000  66.13500     ▇▅▅▃▅▂▂▁▁
## male     0.4724265  0.4996986  0.000000   1.00000    ▇▁▁▁▁▁▁▁▁▇

For our models below, we want just the adults. So let’s create a new (and clearly named) data set with just the adults.

HowellAdults <-
  Howell1 %>%
  filter(age >= 18)   # filter selects only rows that match the condition

To find out more about this data set, type ?Howell1.

With data in hand, we are ready to model.

Model 1: Everybody is Normal

Model 1 (Math)

McElreath proposes this as a first model for heights of these adults.

\[\begin{align*} h_i & \sim \sf{Norm}(\mu, \sigma) \\ \mu & \sim \sf{Norm}(178, 20) \\ \sigma & \sim \sf{Unif}(0, 50) \end{align*}\]

Why Norm(178, 20)?
- McElreath is 178 cm tall!
- A standard deviation of 20 makes any value between 138 and 208 cm fairly plausible. (Middle 95% of normal distribution.)
- So this is pretty weak prior – it doesn’t specify very precisely what the mean height might be.

Why Unif(0, 50)?
- This is not a good prior, but it is easy and it expresses that the standard deviation must be positive.
- A standard deviation of 50 would be pretty crazy. That would suggest that the tallest 2.5% of heights are 200 cm taller than the shortest 2.5%
- We’ll talk about better priors later.

Prior Predictive Sampling

We can sample from our priors and use the sampled values of \(\mu\) and \(\sigma\) to compute some heights. This will tell us what our prior is saying about observed heights.

Prior4.1 <- 
  tibble(
    mu =     rnorm(1e4, 178, 20),
    sigma =  runif(1e4, 0, 50),
    height = rnorm(1e4, mu, sigma)
  )
head(Prior4.1)

## # A tibble: 6 x 3
##      mu sigma height
##   <dbl> <dbl>  <dbl>
## 1  202. 47.7    167.
## 2  159.  4.86   166.
## 3  178. 37.2    198.
## 4  224. 44.7    239.
## 5  157. 33.0    122.
## 6  117.  8.03   121.

gf_dens(~height, data = Prior4.1)

Note that if we go overboard with our wide prior, we get silly results:

Prior4.1a <-
  tibble(
    mu = rnorm(1e4, 178, 100),
    sigma = runif(1e4, 0, 50),
    height = rnorm(1e4, mu, sigma)
  ) 

last_plot() %>%
  gf_dens( ~height, color = "red",
           data = Prior4.1a
  )

Changing the standard deviation of our prior for \(\mu\) from 20 to 100 says that a noticeable proportion of the population will have heights that are less than 0 or greater than 4 meters. That clearly does not reflect what we know about people, so it would be a poor choice for our prior.

We can use this sort of prior predictive sampling to investigate the impact of our choice of priors.

Let’s try one more, this time with an extremely small standard deviation for the prior on \(\mu\):

Prior4.1b <-
  tibble(
    mu = rnorm(1e4, 178, 0.1),
    sigma = runif(1e4, 0, 50),
    height = rnorm(1e4, mu, sigma)
  ) 
last_plot() %>%
  gf_dens( ~ height, color = "steelblue",
           data = Prior4.1b
  )

hdi(Prior4.1,  pars = "height")
hdi(Prior4.1b, pars = "height")

##      par       lo       hi     mode prob
## 1 height 108.6064 250.5676 177.6762 0.95
##      par      lo       hi     mode prob
## 1 height 114.168 237.8867 177.9269 0.95

Because of the wide prior for \(\sigma\), this doesn’t change the prior predictions as much you might have expected. But stay tuned for a look at what this does to the posterior.

Model 1 (quap)

Now let’s translate our model into quap() code. Here’s the model:

\[\begin{align*} h_i & \sim \sf{Norm}(\mu, \sigma) \\ \mu & \sim \sf{Norm}(178, 20) \\ \sigma & \sim \sf{Unif}(0, 50) \end{align*}\]

And here’s the code:

library(rethinking)
m4.1 <- 
  quap(
    data = HowellAdults,
    alist(
      height ~ dnorm(mu, sigma),
      mu     ~ dnorm(178, 20),
      sigma  ~ dunif(0, 50)
    )
  )

Things to note:

Put the model description is inside alist()
- alist() creates an unevaluated list. This lets quap() process the code itself, rather than what the code evaluates to.
- If that makes no sense to you, just remember: use alist() when describing the model.
Use <- for direct computations (arithmetic).
Use ~ for priors and likelihood. Think ~ for distributions.
quap() is doing some tricky things behind the scenes
- beta_0 ~ dnorm(178, 100) will turn into dnorm(beta_0, 178, 100, log = TRUE)
- this makes the code look a lot like the mathematical notation (slick)
quap() can fail.

The most likely failure you will encounter early on is caused by a bad starting point for the search.

Unless you tell it otherwise, quap() use prior sampling to pick a place to start its search. So just rerunning will often fix the problem. (Choosing a reasoble prior also helps.) If you use set.seed(), then you will get the same random starting point each time you run the code. If things fail, change your seed and try again.

You can specify a starting point with the start argument to quap().
```
quap(
  data = HowellAdults,
  alist(
    height ~ dnorm(mu, sigma),
    mu     ~ dnorm(178, 20),
    sigma  ~ dunif(0, 50)
  ),
  start = list(mu = 178, sigma = 5)
)
```
What makes a starting point bad? quap() uses a “hill climbing” algorithm to search for the posterior mode. If the starting point is very far from the posterior mode and the posterior is very flat at the starting point, it is hard to know in which direction to begin the search for the mode – the local terain doesn’t give much of a clue. So quap() may wander around aimlessly in a vast plain of low posterior density and never get close to the posterior mode. Also, if the posterior is multi-modal, we might get drawn to the wrong local maximum.

Investigating the model

Quick summary of marginal posterior distributions:

library(rethinking)
precis(m4.1)

##             mean        sd       5.5%      94.5%
## mu    154.606969 0.4119838 153.948539 155.265398
## sigma   7.731128 0.2913667   7.265468   8.196789

5.5% and 94.5% indicate that we are seeing 89% HDIs.

Why 89%? It’s just the default… I don’t recommend 95% intervals because readers will have a hard time not viewing them as significance tests….

McElreath probably goes a bit overboard here, but he is correct that 95% is just a common convention, there is nothing particularly special about it. And when working with normal distributions, it doesn’t matter what percentage you use since they all communicate exactly the same information. (Once you know any percentage, you can work out the standard deviation and mean, so you know everything there is to know.) But it isn’t clear that choosing a different arbirary number (89%) is any better than using the conventional arbitrary number (95%). In any case, you can choose whatever percent you like:

precis(m4.1, prob = .92)

##             mean        sd         4%       96%
## mu    154.606969 0.4119838 153.885714 155.32822
## sigma   7.731128 0.2913667   7.221037   8.24122

Posterior samples

Post4.1 <- extract.samples(m4.1)
gf_histogram(~mu, data = Post4.1) /
  gf_histogram(~sigma, data = Post4.1)

gf_point(mu ~ sigma, data = Post4.1, alpha = 0.3) %>%
  gf_density2d(color = "skyblue")

That narrow prior

Let’s take a look at how the posterior changes if we use the super narrow prior for \(\mu\):

m4.2 <- 
  quap(
    data = HowellAdults,
    alist(
      height ~ dnorm(mu, sigma),
      mu     ~ dnorm(178, 0.1),
      sigma  ~ dunif(0, 50)
    )
  )
Post4.2 <- extract.samples(m4.2)
gf_dens(~ mu, data = Post4.1, color = ~ "Model 1") %>%
  gf_dens(~ mu, data = Post4.2, color = ~ "Model 2")

gf_density2d( mu ~ sigma, data = Post4.1, color = ~"Model 1") %>%
gf_density2d( mu ~ sigma, data = Post4.2, color = ~"Model 2")

Notice that the two models have very different posterior ideas about what \(\mu\) and \(\sigma\) might be. Do you see why?

Model 3: Height depends (linearly) on weight

Model 3 (Math)

Here’s one way to express this model:

\[\begin{align*} h_i & \sim \sf{Norm}(\mu_i, \sigma) \\ \mu_i & = \beta_0 + \beta_1 * weight_i \\ \beta_0 & \sim ?? \\ \beta_1 & \sim ?? \\ \sigma & \sim ?? \\ \end{align*}\]

But we will prefer a different way:

\[\begin{align*} h_i & \sim \sf{Norm}(\mu_i, \sigma) \\ \mu_i & = \alpha_0 + \beta_1 \cdot (weight_i - \overline{weight}) \\ \alpha_0 & \sim ?? \\ \beta_1 & \sim ?? \\ \sigma & \sim ?? \\ \end{align*}\]

Why do we prefer this centered model?

It is easier to come up with a prior for \(\alpha_0\) than for \(\beta_0\) because \(\beta_0\) is hard to interpret. (On average, how tall are people who weigh nothing?)
Centering reduces or eliminates correlation among parameters in the posterior.

Choosing priors and prior sampling

Since \(\alpha_0\) indicates the average height for a person of average weight, it makes sense to use the same prior as in our previous model.
We will also use the same (poor) prior for \(\sigma\).
Let’s focus on the prior for \(\beta_1\).

Which prior looks better?

mean_weight = mean( ~ weight, data = HowellAdults)
Prior4.3a <-
  tibble(
    alpha_0 = rnorm(100, 178, 20),
    beta_1  = rnorm(100, 0, 10),
    sigma   = runif(100, 0, 50)
  ) 
Prior4.3b <-  
  tibble(
    alpha_0 = rnorm(100, 178, 20),
    beta_1  = exp(rnorm(100, 0, 1)),
    sigma   = runif(100, 0, 50)
  ) 
Prior4.3a %>%
  gf_abline(slope = ~ beta_1, intercept = ~ alpha_0 - beta_1 * mean_weight,
            alpha = 0.3) %>%
  gf_hline(yintercept = ~ c(0, 272), color = "red", linetype = "dashed", data = NA) %>%
  gf_lims(x = c(30, 60), y = c(-100, 400)) %>%
  gf_labs(y = "height") /
 
Prior4.3b %>%
  gf_abline(slope = ~ beta_1, intercept = ~ alpha_0 - beta_1 * mean_weight, 
            alpha = 0.3) %>%
  gf_hline(yintercept = ~ c(0, 272), color = "red", linetype = "dashed", data = NA) %>%
  gf_lims(x = c(30, 60), y = c(-100, 400)) %>%
  gf_labs(y = "height")

Let’s go with the second one.

\[\begin{align*} h_i & \sim \sf{Norm}(\mu_i, \sigma) \\ \mu_i & = \alpha_0 + \beta_1 \cdot (weight_i - \overline{weight}) \\ \alpha_0 & \sim \sf{Norm}(178, 20) \\ \beta_1 & \sim \sf{LogNorm}(0, 1) \\ \sigma & \sim \sf{Unif}(0, 50) \\ \end{align*}\]

If \(X\) is normal, why is \(e^X\) called lognormal? Because its logarithm is normally distributed. Confusing, but true.

Let \(X \sim \sf{LogNorm}(\mu, \sigma)\). That is \(X = e^Y\) where \(Y \sim \sf{Norm}(\mu, \sigma)\). In R, the parameters are called logmean and logsd. Here are some example plots.

gf_dist('lnorm', meanlog = 0, sdlog = 1, 
        fill = ~ "LogNorm(0,1)", alpha = 0.3, geom = "area") %>%
# gf_dist('lnorm', meanlog = 0, sdlog = 2, 
#         fill = ~ "LogNorm(0,2)", alpha = 0.3, geom = "area") %>%
gf_dist('lnorm', meanlog = 0, sdlog = 1/2, 
        fill = ~ "LogNorm(0,1/2)", alpha = 0.3, geom = "area") %>%
gf_dist('lnorm', meanlog = 1, sdlog = 1, 
        fill = ~ "LogNorm(1,1)", alpha = 0.3, geom = "area") %>%
  gf_lims(x = c(0,15))

The \(\mu\) parameter determines the median of the log-normal distribuiton since \(P(X \le m) = P(e^X \le e^m)\)

qlnorm(0.5, meanlog = 0, sdlog = 0:4)

## [1] 1 1 1 1 1

qlnorm(0.5, meanlog = 1, sdlog = 0:4)

## [1] 2.718282 2.718282 2.718282 2.718282 2.718282

The folowing table of statistics (also available at https://en.wikipedia.org/wiki/Log-normal_distribution) can be useful in determining good choices for \(\mu\) and \(\sigma\) in a prior.

statistic	expression
median	\(e^{\mu} = \exp(\mu)\)
mean	\({\displaystyle \exp \left(\mu +{\frac {\sigma ^{2}}{2}}\right)}\)
mode	\({\displaystyle \exp(\mu -\sigma ^{2})}\)
variance	\({\displaystyle [\exp(\sigma ^{2})-1]\exp(2\mu +\sigma ^{2})}\)
	So larger \(\mu\) and larger \(\sigma\) make for larger variance, but the impact of \(\sigma\) is greater (because of the squaring)

Model 3 (quap)

m4.3 <- 
  quap(
    data = HowellAdults,
    alist(
      height ~ dnorm(mu, sigma),
      mu    <- alpha_0 + beta_1 * (weight - mean(weight)),
      alpha_0 ~ dnorm(178, 20),
      beta_1 ~ dlnorm(0, 1),
      sigma  ~ dunif(0, 50)
    )
  )

Posterior Explorations

Anything that depends on parameters has a posterior distribution.

Post4.3 <- m4.3 %>% extract.samples(1e4) %>%
  mutate(beta_0 = alpha_0 - beta_1 * mean(HowellAdults$weight))

( 
  gf_dens( ~ alpha_0 , data = Post4.3) |
  gf_blank(0 ~ 0)  %>% gf_theme(theme_void())
) /
  (
    gf_density2d(beta_1 ~ alpha_0, data = Post4.3) |
      gf_dens(beta_1 ~ ., data = Post4.3) 
  )

library(GGally)

## Registered S3 method overwritten by 'GGally':
##   method from   
##   +.gg   ggplot2

## 
## Attaching package: 'GGally'

## The following object is masked from 'package:pander':
## 
##     wrap

ggpairs(Post4.3, upper = list(continuous = "density"))

Posterior Lines

gf_point(height ~ weight, data = HowellAdults) %>%
  gf_abline(
    slope = ~ beta_1, intercept = ~ beta_0, 
    color = "steelblue", alpha = 0.1,
    data = Post4.3 %>% head(100))

Model 4: Height depends on weight and sex

Model 4 (Math)

\[\begin{align*} h_i & \sim \sf{Norm}(\mu_i, \sigma) \\ \mu_i & = \alpha_0 + \beta_1 \cdot (weight_i - \overline{weight}) + \beta_2 * male \\ \alpha_0 & \sim \sf{Norm}(178, 20) \\ \beta_1 & \sim \sf{LogNorm}(0, 1) \\ \beta_2 & \sim \sf{Norm}(0, 10) \\ \sigma & \sim \sf{Unif}(0, 50) \\ \end{align*}\]

Model 4 (quap)

Note: The male variable is either 0 (for females) or 1 (for males).

m4.4 <- 
  quap(
    data = HowellAdults,
    alist(
      height ~ dnorm(mu, sigma),
      mu    <- alpha_0 + beta_1 * (weight - mean(weight)) + beta_2 * male,
      alpha_0 ~ dnorm(178, 20),
      beta_1 ~ dlnorm(0, 1),
      beta_2 ~ dnorm(0, 10),
      sigma  ~ dunif(0, 50)
    )
  )
Post4.4 <- extract.samples(m4.4, n = 1e3) %>%
  mutate(beta_0 = alpha_0 - beta_1 * mean_weight)

precis(m4.4)

##                mean         sd        5.5%       94.5%
## alpha_0 151.5613770 0.33723981 151.0224027 152.1003514
## beta_1    0.6407607 0.04124548   0.5748424   0.7066789
## beta_2    6.4840091 0.53268348   5.6326780   7.3353402
## sigma     4.2537758 0.16031290   3.9975649   4.5099868

ggpairs(Post4.4, upper = list(continuous = "density"))

gf_point(height ~ weight, color = ~male, data = HowellAdults) %>%
  gf_abline(slope = ~ beta_1, intercept = ~ beta_0, alpha = 0.1,
            color = ~0,
            data = Post4.4 %>% head(100)) %>%
  gf_abline(slope = ~ beta_1, intercept = ~ beta_0 + beta_2, alpha = 0.1,
            color = ~1,
            data = Post4.4 %>% head(100))

last_plot() %>%
  gf_facet_grid( ~ male)

More Posterior Predictions

In this section, we want to refine our approach of plotting 100 posterior lines on the plot in two ways:

Instead of 100 individual lines, we would like to the upper and lower bound for some fraction of the lines (at each weight).
In addition to plotting the lines, which represent average height, we would like to see a range of plausible values for individual heights (at each weight).

Before dealing with each of these at each weight, we’ll start by looking at just one weight and then figure out how to scale up.

Average and individual heights for people who weigh 50 kg

What does our model say about the heights of people who weigh 50 kg?

On average their height should be \(\mu = \alpha_0 + \beta_1 (50 - \overline{weight})\)
An individual height should be a draw from a \(\sf{Norm}(\alpha_0 + \beta_1 (50 - \overline{weight}), \sigma)\)-distribution.

Keep two things in mind

We don’t have values for the parameters (\(\alpha_0\), \(\beta_1\), and \(\sigma\)), we have (posterior) distributions. So we will have distributions as answers to these new questions as well. We may choose to summarize the distributions with a number (perhaps the mean or the median), or a range (an HDI), but these are fundamentally distributions.
The posterior distribution is what the model says (based on the data, the likelihood, and the priors). That doesn’t make it true. So in addition to figuring out what a models says, we will want to investigate whether the model should be believed in the first place. Part of that can be determined by comparing what the model says to what we know or think we know, including by comparing things to the data used to fit the model. (But performing well on the data used to fit the model is a very low bar – and fitting very well can actually be a bad thing. More on this later.)

OK. Let’s compute those posterior distributions. We’ll do this by adding additional columns to our posterior samples.

n <- nrow(Post4.3)
Post4.3 <-
  Post4.3 %>%
  mutate(
    mean50 = beta_0 + beta_1 * 50,
    individual50 = rnorm(n, mean50, sigma)
  )

Post4.3 %>%
  gf_dens( ~ mean50, color = ~ "mean50") %>%
  gf_dens( ~ individual50, color = ~ "individual50") %>%
  gf_labs(x = "height | weight = 50")

Post4.3 %>% mean_hdi(mean50)

##     mean50   .lower  .upper .width .point .interval
## 1 159.1285 158.4729 159.821   0.95   mean       hdi

Post4.3 %>% mean_hdi(individual50)

##   individual50  .lower   .upper .width .point .interval
## 1     159.0531 148.564 168.4919   0.95   mean       hdi

gf_point(height ~ weight, data = HowellAdults) %>%
  gf_pointrange(individual50 + .lower + .upper ~ 50, 
                color = ~ "individual", size = 1.2,
                data = Post4.3 %>% mean_hdi(individual50)) %>%
  gf_pointrange(mean50 + .lower + .upper ~ 50, 
                color = ~ "average", size = 0.4,
                data = Post4.3 %>% mean_hdi(mean50))

As we see (and as makes sense), the model makes much more precise estimates about the average than about individuals.

Now we would like to do this same thing for weights all across the range of weights in our data.

Scaling up – doing it ourselves

First, let’s write a couple of functions that take posterior samples and a weight and produce an HDI for the mean or individual height.

avg_height <- function(post, weight, prob = 0.95) {
  post %>%
    mutate(mean_height = beta_0 + beta_1 * weight) %>%
    mean_hdi(mean_height, .width = prob) %>%
    mutate(weight = weight) 
}

ind_height <- function(post, weight, prob = 0.95) {
  n <- nrow(post)
  post %>%
    mutate(ind_height = rnorm(n, beta_0 + beta_1 * weight, sigma)) %>%
    mean_hdi(ind_height, .width = prob) %>%
    mutate(weight = weight)
}

avg_height(Post4.3, 50)

##   mean_height   .lower  .upper .width .point .interval weight
## 1    159.1285 158.4729 159.821   0.95   mean       hdi     50

ind_height(Post4.3, 50)

##   ind_height   .lower   .upper .width .point .interval weight
## 1   159.2391 149.1294 169.1683   0.95   mean       hdi     50

avg_height(Post4.3, 40)

##   mean_height   .lower   .upper .width .point .interval weight
## 1    150.0943 149.4073 150.7312   0.95   mean       hdi     40

ind_height(Post4.3, 40)

##   ind_height   .lower   .upper .width .point .interval weight
## 1    150.124 139.8024 159.9972   0.95   mean       hdi     40

Now we want to apply these function to a lot of different weights.

Avg <- map_dfr(seq(30, 65, by = 1), ~ avg_height(Post4.3, .x))
Ind <- map_dfr(seq(30, 65, by = 1), ~ ind_height(Post4.3, .x))
Avg %>% reactable::reactable()

Ind %>% reactable::reactable()

And finally, we can plot the results

gf_point(height ~ weight, data = HowellAdults) %>%
  gf_line(.lower ~ weight, data = Avg, color = ~"avg") %>%
  gf_line(.upper ~ weight, data = Avg, color = ~"avg") %>%
  gf_line(.lower ~ weight, data = Ind, color = ~"ind") %>%
  gf_line(.upper ~ weight, data = Ind, color = ~"ind")

gf_ribbon(.lower + .upper ~ weight, data = Ind, fill = ~"ind") %>%
  gf_ribbon(.lower + .upper ~ weight, data = Avg, fill = ~"avg") %>%
  gf_point(height ~ weight, data = HowellAdults, inherit = FALSE)

Notice that most but not all of the dots fall in the blue band. That’s a good sign. The model predicts that 95% of heights will fall in the blue region.

This is a useful exercise for understanding how the posterior sampling is working, and in case you ever need to do something special, for which there is not pre-made tool.

But creating posterior distributions of averages and simulated individuals is very common, so we’d like a simpler way to get those.

Scaling up – using `link()` and `sim()`

The link() and sim() functions start with a model, extract posterior samples, and then compute the average response or a simulated individual response for each row in the posterior – just like we did above.

Link4.3 <-
  link(m4.3, tibble(weight = seq(30, 65, by = 1)))
Sim4.3 <- 
  sim(m4.3, tibble(weight = seq(30, 65, by = 1)))

str(Link4.3)

##  num [1:1000, 1:36] 139 138 138 139 139 ...

str(Sim4.3)

##  num [1:1000, 1:36] 146 136 137 138 133 ...

The result is a matrix. That’s not quite as nice to work with as the tidy data frame we made, but it sure was easy. So let’s learn how to work with matrices.

Each row is a posterior sample. Each column is one of the weights we specified. The colums are not labeled, so it is up to us to keep track of the weights we used.

We can apply a function to each row or column of a matrix using apply().

apply(Link4.3, 2, mean)

##  [1] 138.7927 139.6959 140.5990 141.5021 142.4052 143.3083 144.2114 145.1146
##  [9] 146.0177 146.9208 147.8239 148.7270 149.6302 150.5333 151.4364 152.3395
## [17] 153.2426 154.1458 155.0489 155.9520 156.8551 157.7582 158.6613 159.5645
## [25] 160.4676 161.3707 162.2738 163.1769 164.0801 164.9832 165.8863 166.7894
## [33] 167.6925 168.5957 169.4988 170.4019

apply(Sim4.3, 2, mean)

##  [1] 138.9238 139.9080 140.6339 141.4232 142.2916 143.4852 143.8948 145.1874
##  [9] 145.8804 146.8247 147.8853 148.8282 149.6284 150.4477 151.4615 152.3452
## [17] 153.3931 154.2658 155.2714 155.7802 156.7112 157.9563 158.6530 159.5821
## [25] 160.2852 161.1486 162.3408 163.2250 163.8414 164.9250 165.7233 166.7799
## [33] 167.6513 168.4459 169.2927 170.3008

apply(Sim4.3, 2, mean_hdi)

## [[1]]
##          y    ymin    ymax .width .point .interval
## 1 138.9238 129.763 148.893   0.95   mean       hdi
## 
## [[2]]
##         y     ymin     ymax .width .point .interval
## 1 139.908 130.1833 148.9155   0.95   mean       hdi
## 
## [[3]]
##          y     ymin     ymax .width .point .interval
## 1 140.6339 131.6083 151.0646   0.95   mean       hdi
## 
## [[4]]
##          y     ymin     ymax .width .point .interval
## 1 141.4232 131.6253 150.9545   0.95   mean       hdi
## 
## [[5]]
##          y    ymin     ymax .width .point .interval
## 1 142.2916 132.953 153.6856   0.95   mean       hdi
## 
## [[6]]
##          y     ymin     ymax .width .point .interval
## 1 143.4852 133.2422 152.4965   0.95   mean       hdi
## 
## [[7]]
##          y     ymin     ymax .width .point .interval
## 1 143.8948 134.3319 154.1051   0.95   mean       hdi
## 
## [[8]]
##          y     ymin     ymax .width .point .interval
## 1 145.1874 135.1368 155.1997   0.95   mean       hdi
## 
## [[9]]
##          y     ymin     ymax .width .point .interval
## 1 145.8804 135.6577 155.0912   0.95   mean       hdi
## 
## [[10]]
##          y     ymin     ymax .width .point .interval
## 1 146.8247 135.9126 155.2908   0.95   mean       hdi
## 
## [[11]]
##          y     ymin     ymax .width .point .interval
## 1 147.8853 138.4962 158.0911   0.95   mean       hdi
## 
## [[12]]
##          y     ymin     ymax .width .point .interval
## 1 148.8282 138.8428 158.2813   0.95   mean       hdi
## 
## [[13]]
##          y     ymin     ymax .width .point .interval
## 1 149.6284 138.8307 159.9383   0.95   mean       hdi
## 
## [[14]]
##          y     ymin     ymax .width .point .interval
## 1 150.4477 139.6388 159.7387   0.95   mean       hdi
## 
## [[15]]
##          y    ymin     ymax .width .point .interval
## 1 151.4615 141.984 161.7831   0.95   mean       hdi
## 
## [[16]]
##          y     ymin     ymax .width .point .interval
## 1 152.3452 142.3914 161.6529   0.95   mean       hdi
## 
## [[17]]
##          y     ymin    ymax .width .point .interval
## 1 153.3931 143.6974 163.217   0.95   mean       hdi
## 
## [[18]]
##          y     ymin     ymax .width .point .interval
## 1 154.2658 144.7579 164.3611   0.95   mean       hdi
## 
## [[19]]
##          y     ymin     ymax .width .point .interval
## 1 155.2714 146.1595 166.0853   0.95   mean       hdi
## 
## [[20]]
##          y     ymin     ymax .width .point .interval
## 1 155.7802 145.8525 165.1172   0.95   mean       hdi
## 
## [[21]]
##          y     ymin     ymax .width .point .interval
## 1 156.7112 147.2643 166.2385   0.95   mean       hdi
## 
## [[22]]
##          y     ymin     ymax .width .point .interval
## 1 157.9563 148.0792 167.8134   0.95   mean       hdi
## 
## [[23]]
##         y     ymin     ymax .width .point .interval
## 1 158.653 148.7478 168.5664   0.95   mean       hdi
## 
## [[24]]
##          y     ymin     ymax .width .point .interval
## 1 159.5821 150.3488 170.1216   0.95   mean       hdi
## 
## [[25]]
##          y     ymin     ymax .width .point .interval
## 1 160.2852 151.2575 169.6647   0.95   mean       hdi
## 
## [[26]]
##          y     ymin     ymax .width .point .interval
## 1 161.1486 151.3547 171.8448   0.95   mean       hdi
## 
## [[27]]
##          y    ymin     ymax .width .point .interval
## 1 162.3408 151.162 171.8525   0.95   mean       hdi
## 
## [[28]]
##         y     ymin     ymax .width .point .interval
## 1 163.225 153.0656 172.3235   0.95   mean       hdi
## 
## [[29]]
##          y     ymin     ymax .width .point .interval
## 1 163.8414 152.7433 172.7558   0.95   mean       hdi
## 
## [[30]]
##         y     ymin     ymax .width .point .interval
## 1 164.925 154.7307 174.4543   0.95   mean       hdi
## 
## [[31]]
##          y     ymin     ymax .width .point .interval
## 1 165.7233 156.1104 176.3369   0.95   mean       hdi
## 
## [[32]]
##          y     ymin     ymax .width .point .interval
## 1 166.7799 157.1139 175.9081   0.95   mean       hdi
## 
## [[33]]
##          y     ymin     ymax .width .point .interval
## 1 167.6513 156.5707 177.1343   0.95   mean       hdi
## 
## [[34]]
##          y     ymin     ymax .width .point .interval
## 1 168.4459 158.5129 178.6173   0.95   mean       hdi
## 
## [[35]]
##          y     ymin   ymax .width .point .interval
## 1 169.2927 159.4048 179.02   0.95   mean       hdi
## 
## [[36]]
##          y    ymin     ymax .width .point .interval
## 1 170.3008 160.614 178.9484   0.95   mean       hdi

apply(Sim4.3, 2, mean_hdi) %>% 
  bind_rows() %>%
  mutate(weight = seq(30, 65, by = 1))

##           y     ymin     ymax .width .point .interval weight
## 1  138.9238 129.7630 148.8930   0.95   mean       hdi     30
## 2  139.9080 130.1833 148.9155   0.95   mean       hdi     31
## 3  140.6339 131.6083 151.0646   0.95   mean       hdi     32
## 4  141.4232 131.6253 150.9545   0.95   mean       hdi     33
## 5  142.2916 132.9530 153.6856   0.95   mean       hdi     34
## 6  143.4852 133.2422 152.4965   0.95   mean       hdi     35
## 7  143.8948 134.3319 154.1051   0.95   mean       hdi     36
## 8  145.1874 135.1368 155.1997   0.95   mean       hdi     37
## 9  145.8804 135.6577 155.0912   0.95   mean       hdi     38
## 10 146.8247 135.9126 155.2908   0.95   mean       hdi     39
## 11 147.8853 138.4962 158.0911   0.95   mean       hdi     40
## 12 148.8282 138.8428 158.2813   0.95   mean       hdi     41
## 13 149.6284 138.8307 159.9383   0.95   mean       hdi     42
## 14 150.4477 139.6388 159.7387   0.95   mean       hdi     43
## 15 151.4615 141.9840 161.7831   0.95   mean       hdi     44
## 16 152.3452 142.3914 161.6529   0.95   mean       hdi     45
## 17 153.3931 143.6974 163.2170   0.95   mean       hdi     46
## 18 154.2658 144.7579 164.3611   0.95   mean       hdi     47
## 19 155.2714 146.1595 166.0853   0.95   mean       hdi     48
## 20 155.7802 145.8525 165.1172   0.95   mean       hdi     49
## 21 156.7112 147.2643 166.2385   0.95   mean       hdi     50
## 22 157.9563 148.0792 167.8134   0.95   mean       hdi     51
## 23 158.6530 148.7478 168.5664   0.95   mean       hdi     52
## 24 159.5821 150.3488 170.1216   0.95   mean       hdi     53
## 25 160.2852 151.2575 169.6647   0.95   mean       hdi     54
## 26 161.1486 151.3547 171.8448   0.95   mean       hdi     55
## 27 162.3408 151.1620 171.8525   0.95   mean       hdi     56
## 28 163.2250 153.0656 172.3235   0.95   mean       hdi     57
## 29 163.8414 152.7433 172.7558   0.95   mean       hdi     58
## 30 164.9250 154.7307 174.4543   0.95   mean       hdi     59
## 31 165.7233 156.1104 176.3369   0.95   mean       hdi     60
## 32 166.7799 157.1139 175.9081   0.95   mean       hdi     61
## 33 167.6513 156.5707 177.1343   0.95   mean       hdi     62
## 34 168.4459 158.5129 178.6173   0.95   mean       hdi     63
## 35 169.2927 159.4048 179.0200   0.95   mean       hdi     64
## 36 170.3008 160.6140 178.9484   0.95   mean       hdi     65

Wow, that’s pretty slick. Let’s remake our plots using this method.

Avg <- 
  apply(Link4.3, 2, mean_hdi) %>% 
  bind_rows() %>%
  mutate(weight = seq(30, 65, by = 1))

Ind <- 
  apply(Sim4.3, 2, mean_hdi) %>% 
  bind_rows() %>%
  mutate(weight = seq(30, 65, by = 1))

gf_ribbon(ymin + ymax ~ weight, data = Ind, fill = ~"ind") %>%
  gf_ribbon(ymin + ymax ~ weight, data = Avg, fill = ~"avg") %>%
  gf_point(height ~ weight, data = HowellAdults, inherit = FALSE)

Questions to ask about/with a model

What are the consequences of our choice of prior?
Did the algorithm work? (convergence, etc.)
Does the model fit well?
How precise are the parameter estimates?
What do the parameters mean? Do they mean anything?
What does the model predict?

We will continue to address all of these as we go along.

Quadratic Approximation with quap()

Stat 341

Goals for today

Side Note: A bad idea

dpqr-functions

Notes

Some Example Models

Howell data set

Model 1: Everybody is Normal

Model 1 (Math)

Prior Predictive Sampling

Model 1 (quap)

Investigating the model

Quick summary of marginal posterior distributions:

Posterior samples

That narrow prior

Model 3: Height depends (linearly) on weight

Model 3 (Math)

Choosing priors and prior sampling

Model 3 (quap)

Posterior Explorations

Posterior Lines

Model 4: Height depends on weight and sex

Model 4 (Math)

Model 4 (quap)

More Posterior Predictions

Average and individual heights for people who weigh 50 kg

Scaling up – doing it ourselves

Scaling up – using `link()` and `sim()`

Questions to ask about/with a model

Quadratic Approximation with quap()

Stat 341

Goals for today

Side Note: A bad idea

dpqr-functions

Notes

Some Example Models

Howell data set

Model 1: Everybody is Normal

Model 1 (Math)

Prior Predictive Sampling

Model 1 (quap)

Investigating the model

Quick summary of marginal posterior distributions:

Posterior samples

That narrow prior

Model 3: Height depends (linearly) on weight

Model 3 (Math)

Choosing priors and prior sampling

Model 3 (quap)

Posterior Explorations

Posterior Lines

Model 4: Height depends on weight and sex

Model 4 (Math)

Model 4 (quap)

More Posterior Predictions

Average and individual heights for people who weigh 50 kg

Scaling up – doing it ourselves

Scaling up – using link() and sim()

Questions to ask about/with a model

Scaling up – using `link()` and `sim()`