Goals for today

  1. Express models in mathematical notation

  2. Convert mathematical notation into quap() code

  3. Think a bit about priors

  4. Investigate models using prior sampling & posterior sampling

    • dpqr-functions

Side Note: A bad idea

McElreath says

I use the name d over and over again in this book to refer to the data frame we aer working with at the moment. I keep its name short to save you typing.

This is my least favorite feature of this text. It is very error prone (easy to grab the wrong data) and confusing (which data is d just now?). The data being used are a really important part of your model – you always want to know what the data are. So… I will not follow this practice and I encourage you to avoid it as well.

I will usually capitalize the names of data frames and use lower case for the variables inside them. This helps to recognize which is which. But it is an uncommon convention in the R community. (The use of capitalization is the R world is mostly haphazard with some preference for lower case. You will see CamelCase, camelCase, snake_case, dot.case, and sometimes all mixed together in the same code. All I can say about this is “I’m sorry”.)

dpqr-functions

In R, there are typically 4 functions associated with each distribution. Suppose \(X \sim {\sf Dist}(...)\), where \({\sf Dist}\) is a place holder for the distribution family and \(...\) are the parameters of the distribution.

function key word what it computes
ddist(x, ...) density the pmf or pdf function for \(X\)
pdist(q, ...) probability \(P(X \le q)\)
qdist(p, ...) quantile smallest \(q\) such that \(P(X \le q) \ge p\).
rdist(n, ...) random generate n random draws from the distribution

In addition, the gf_dist() function can be used to plot a distribution.

Many examples of dist: norm, beta, unif, exp, gamma, binom, lnorm, triangle, t, etc.

Notes

  • If you put an x in front of some (the most commmon ones) of these you get something extra – a plot.

    xpnorm(120, mean = 100, sd = 15)  

    xpbeta(0.8, shape1 = 8, shape2 = 2)

  • The p-function and q-function are inverses. The p-function takes a value as input and returns a probability. The q-function takes a probability as input and returns a value.

    # using defualt values of mean = 0, sd = 1
    qnorm(pnorm(2))
    ## [1] 2
    pnorm(qnorm(0.2))
    ## [1] 0.2
  • These functions are vectorized. Vectorization happens over all arguments simultaneously. (This is why we needed functions like map_dbl() and map2_dbl() in our grid method examples.)

    qnorm(0.025, mean = c(10, 20, 30), sd = c(1, 2, 3))
    ## [1]  8.040036 16.080072 24.120108
    pnorm(c(12, 22, 21), mean = c(10, 20, 30), sd = c(1, 2, 3))
    ## [1] 0.977249868 0.841344746 0.001349898
    runif(3, min = 1:3, max = 2:4)
    ## [1] 1.720904 2.875773 3.760982


Some Example Models

Howell data set

Unlike most packages in R, the rethinking packages does not use lazy data. This means you must explicitly load the data sets each time you use them. Here’s how:

library(rethinking)     # load the package (already done in our template Rmd)
data(Howell1)           # load a data set
precis(Howell1)         # precis is fancy word for summary
##               mean         sd      5.5%     94.5%     histogram
## height 138.2635963 27.6024476 81.108550 165.73500 ▁▁▁▁▁▁▁▂▁▇▇▅▁
## weight  35.6106176 14.7191782  9.360721  54.50289 ▁▂▃▂▂▂▂▅▇▇▃▂▁
## age     29.3443934 20.7468882  1.000000  66.13500     ▇▅▅▃▅▂▂▁▁
## male     0.4724265  0.4996986  0.000000   1.00000    ▇▁▁▁▁▁▁▁▁▇

For our models below, we want just the adults. So let’s create a new (and clearly named) data set with just the adults.

HowellAdults <-
  Howell1 %>%
  filter(age >= 18)   # filter selects only rows that match the condition

To find out more about this data set, type ?Howell1.

With data in hand, we are ready to model.

Model 1: Everybody is Normal

Model 1 (Math)

McElreath proposes this as a first model for heights of these adults.

\[\begin{align*} h_i & \sim \sf{Norm}(\mu, \sigma) \\ \mu & \sim \sf{Norm}(178, 20) \\ \sigma & \sim \sf{Unif}(0, 50) \end{align*}\]

  • Why Norm(178, 20)?
    • McElreath is 178 cm tall!
    • A standard deviation of 20 makes any value between 138 and 208 cm fairly plausible. (Middle 95% of normal distribution.)
    • So this is pretty weak prior – it doesn’t specify very precisely what the mean height might be.


  • Why Unif(0, 50)?

    • This is not a good prior, but it is easy and it expresses that the standard deviation must be positive.
    • A standard deviation of 50 would be pretty crazy. That would suggest that the tallest 2.5% of heights are 200 cm taller than the shortest 2.5%
    • We’ll talk about better priors later.

Prior Predictive Sampling

We can sample from our priors and use the sampled values of \(\mu\) and \(\sigma\) to compute some heights. This will tell us what our prior is saying about observed heights.

Prior4.1 <- 
  tibble(
    mu =     rnorm(1e4, 178, 20),
    sigma =  runif(1e4, 0, 50),
    height = rnorm(1e4, mu, sigma)
  )
head(Prior4.1)
## # A tibble: 6 x 3
##      mu sigma height
##   <dbl> <dbl>  <dbl>
## 1  202. 47.7    167.
## 2  159.  4.86   166.
## 3  178. 37.2    198.
## 4  224. 44.7    239.
## 5  157. 33.0    122.
## 6  117.  8.03   121.
gf_dens(~height, data = Prior4.1)

Note that if we go overboard with our wide prior, we get silly results:

Prior4.1a <-
  tibble(
    mu = rnorm(1e4, 178, 100),
    sigma = runif(1e4, 0, 50),
    height = rnorm(1e4, mu, sigma)
  ) 

last_plot() %>%
  gf_dens( ~height, color = "red",
           data = Prior4.1a
  )

Changing the standard deviation of our prior for \(\mu\) from 20 to 100 says that a noticeable proportion of the population will have heights that are less than 0 or greater than 4 meters. That clearly does not reflect what we know about people, so it would be a poor choice for our prior.

We can use this sort of prior predictive sampling to investigate the impact of our choice of priors.

Let’s try one more, this time with an extremely small standard deviation for the prior on \(\mu\):

Prior4.1b <-
  tibble(
    mu = rnorm(1e4, 178, 0.1),
    sigma = runif(1e4, 0, 50),
    height = rnorm(1e4, mu, sigma)
  ) 
last_plot() %>%
  gf_dens( ~ height, color = "steelblue",
           data = Prior4.1b
  )

hdi(Prior4.1,  pars = "height")
hdi(Prior4.1b, pars = "height")
##      par       lo       hi     mode prob
## 1 height 108.6064 250.5676 177.6762 0.95
##      par      lo       hi     mode prob
## 1 height 114.168 237.8867 177.9269 0.95

Because of the wide prior for \(\sigma\), this doesn’t change the prior predictions as much you might have expected. But stay tuned for a look at what this does to the posterior.

Model 1 (quap)

Now let’s translate our model into quap() code. Here’s the model:

\[\begin{align*} h_i & \sim \sf{Norm}(\mu, \sigma) \\ \mu & \sim \sf{Norm}(178, 20) \\ \sigma & \sim \sf{Unif}(0, 50) \end{align*}\]

And here’s the code:

library(rethinking)
m4.1 <- 
  quap(
    data = HowellAdults,
    alist(
      height ~ dnorm(mu, sigma),
      mu     ~ dnorm(178, 20),
      sigma  ~ dunif(0, 50)
    )
  )

Things to note:

  • Put the model description is inside alist()

    • alist() creates an unevaluated list. This lets quap() process the code itself, rather than what the code evaluates to.
    • If that makes no sense to you, just remember: use alist() when describing the model.
  • Use <- for direct computations (arithmetic).

  • Use ~ for priors and likelihood. Think ~ for distributions.

  • quap() is doing some tricky things behind the scenes

    • beta_0 ~ dnorm(178, 100) will turn into dnorm(beta_0, 178, 100, log = TRUE)
    • this makes the code look a lot like the mathematical notation (slick)
  • quap() can fail.

    The most likely failure you will encounter early on is caused by a bad starting point for the search.

    Unless you tell it otherwise, quap() use prior sampling to pick a place to start its search. So just rerunning will often fix the problem. (Choosing a reasoble prior also helps.) If you use set.seed(), then you will get the same random starting point each time you run the code. If things fail, change your seed and try again.

    You can specify a starting point with the start argument to quap().

    quap(
      data = HowellAdults,
      alist(
        height ~ dnorm(mu, sigma),
        mu     ~ dnorm(178, 20),
        sigma  ~ dunif(0, 50)
      ),
      start = list(mu = 178, sigma = 5)
    )

    What makes a starting point bad? quap() uses a “hill climbing” algorithm to search for the posterior mode. If the starting point is very far from the posterior mode and the posterior is very flat at the starting point, it is hard to know in which direction to begin the search for the mode – the local terain doesn’t give much of a clue. So quap() may wander around aimlessly in a vast plain of low posterior density and never get close to the posterior mode. Also, if the posterior is multi-modal, we might get drawn to the wrong local maximum.

Investigating the model

Quick summary of marginal posterior distributions:

library(rethinking)
precis(m4.1)
##             mean        sd       5.5%      94.5%
## mu    154.606969 0.4119838 153.948539 155.265398
## sigma   7.731128 0.2913667   7.265468   8.196789

5.5% and 94.5% indicate that we are seeing 89% HDIs.

Why 89%? It’s just the default… I don’t recommend 95% intervals because readers will have a hard time not viewing them as significance tests….

McElreath probably goes a bit overboard here, but he is correct that 95% is just a common convention, there is nothing particularly special about it. And when working with normal distributions, it doesn’t matter what percentage you use since they all communicate exactly the same information. (Once you know any percentage, you can work out the standard deviation and mean, so you know everything there is to know.) But it isn’t clear that choosing a different arbirary number (89%) is any better than using the conventional arbitrary number (95%). In any case, you can choose whatever percent you like:

precis(m4.1, prob = .92)
##             mean        sd         4%       96%
## mu    154.606969 0.4119838 153.885714 155.32822
## sigma   7.731128 0.2913667   7.221037   8.24122

Posterior samples

Post4.1 <- extract.samples(m4.1)
gf_histogram(~mu, data = Post4.1) /
  gf_histogram(~sigma, data = Post4.1) 

gf_point(mu ~ sigma, data = Post4.1, alpha = 0.3) %>%
  gf_density2d(color = "skyblue")

That narrow prior

Let’s take a look at how the posterior changes if we use the super narrow prior for \(\mu\):

m4.2 <- 
  quap(
    data = HowellAdults,
    alist(
      height ~ dnorm(mu, sigma),
      mu     ~ dnorm(178, 0.1),
      sigma  ~ dunif(0, 50)
    )
  )
Post4.2 <- extract.samples(m4.2)
gf_dens(~ mu, data = Post4.1, color = ~ "Model 1") %>%
  gf_dens(~ mu, data = Post4.2, color = ~ "Model 2") 
gf_density2d( mu ~ sigma, data = Post4.1, color = ~"Model 1") %>%
gf_density2d( mu ~ sigma, data = Post4.2, color = ~"Model 2")

Notice that the two models have very different posterior ideas about what \(\mu\) and \(\sigma\) might be. Do you see why?


Model 3: Height depends (linearly) on weight

Model 3 (Math)

Here’s one way to express this model:

\[\begin{align*} h_i & \sim \sf{Norm}(\mu_i, \sigma) \\ \mu_i & = \beta_0 + \beta_1 * weight_i \\ \beta_0 & \sim ?? \\ \beta_1 & \sim ?? \\ \sigma & \sim ?? \\ \end{align*}\]

But we will prefer a different way:

\[\begin{align*} h_i & \sim \sf{Norm}(\mu_i, \sigma) \\ \mu_i & = \alpha_0 + \beta_1 \cdot (weight_i - \overline{weight}) \\ \alpha_0 & \sim ?? \\ \beta_1 & \sim ?? \\ \sigma & \sim ?? \\ \end{align*}\]

Why do we prefer this centered model?

  1. It is easier to come up with a prior for \(\alpha_0\) than for \(\beta_0\) because \(\beta_0\) is hard to interpret. (On average, how tall are people who weigh nothing?)

  2. Centering reduces or eliminates correlation among parameters in the posterior.

Choosing priors and prior sampling

  • Since \(\alpha_0\) indicates the average height for a person of average weight, it makes sense to use the same prior as in our previous model.
  • We will also use the same (poor) prior for \(\sigma\).
  • Let’s focus on the prior for \(\beta_1\).

Which prior looks better?

mean_weight = mean( ~ weight, data = HowellAdults)
Prior4.3a <-
  tibble(
    alpha_0 = rnorm(100, 178, 20),
    beta_1  = rnorm(100, 0, 10),
    sigma   = runif(100, 0, 50)
  ) 
Prior4.3b <-  
  tibble(
    alpha_0 = rnorm(100, 178, 20),
    beta_1  = exp(rnorm(100, 0, 1)),
    sigma   = runif(100, 0, 50)
  ) 
Prior4.3a %>%
  gf_abline(slope = ~ beta_1, intercept = ~ alpha_0 - beta_1 * mean_weight,
            alpha = 0.3) %>%
  gf_hline(yintercept = ~ c(0, 272), color = "red", linetype = "dashed", data = NA) %>%
  gf_lims(x = c(30, 60), y = c(-100, 400)) %>%
  gf_labs(y = "height") /
 
Prior4.3b %>%
  gf_abline(slope = ~ beta_1, intercept = ~ alpha_0 - beta_1 * mean_weight, 
            alpha = 0.3) %>%
  gf_hline(yintercept = ~ c(0, 272), color = "red", linetype = "dashed", data = NA) %>%
  gf_lims(x = c(30, 60), y = c(-100, 400)) %>%
  gf_labs(y = "height") 

Let’s go with the second one.

\[\begin{align*} h_i & \sim \sf{Norm}(\mu_i, \sigma) \\ \mu_i & = \alpha_0 + \beta_1 \cdot (weight_i - \overline{weight}) \\ \alpha_0 & \sim \sf{Norm}(178, 20) \\ \beta_1 & \sim \sf{LogNorm}(0, 1) \\ \sigma & \sim \sf{Unif}(0, 50) \\ \end{align*}\]

If \(X\) is normal, why is \(e^X\) called lognormal? Because its logarithm is normally distributed. Confusing, but true.

Let \(X \sim \sf{LogNorm}(\mu, \sigma)\). That is \(X = e^Y\) where \(Y \sim \sf{Norm}(\mu, \sigma)\). In R, the parameters are called logmean and logsd. Here are some example plots.

gf_dist('lnorm', meanlog = 0, sdlog = 1, 
        fill = ~ "LogNorm(0,1)", alpha = 0.3, geom = "area") %>%
# gf_dist('lnorm', meanlog = 0, sdlog = 2, 
#         fill = ~ "LogNorm(0,2)", alpha = 0.3, geom = "area") %>%
gf_dist('lnorm', meanlog = 0, sdlog = 1/2, 
        fill = ~ "LogNorm(0,1/2)", alpha = 0.3, geom = "area") %>%
gf_dist('lnorm', meanlog = 1, sdlog = 1, 
        fill = ~ "LogNorm(1,1)", alpha = 0.3, geom = "area") %>%
  gf_lims(x = c(0,15))

The \(\mu\) parameter determines the median of the log-normal distribuiton since \(P(X \le m) = P(e^X \le e^m)\)

qlnorm(0.5, meanlog = 0, sdlog = 0:4)
## [1] 1 1 1 1 1
qlnorm(0.5, meanlog = 1, sdlog = 0:4)
## [1] 2.718282 2.718282 2.718282 2.718282 2.718282

The folowing table of statistics (also available at https://en.wikipedia.org/wiki/Log-normal_distribution) can be useful in determining good choices for \(\mu\) and \(\sigma\) in a prior.

statistic expression
median \(e^{\mu} = \exp(\mu)\)
mean \({\displaystyle \exp \left(\mu +{\frac {\sigma ^{2}}{2}}\right)}\)
mode \({\displaystyle \exp(\mu -\sigma ^{2})}\)
variance \({\displaystyle [\exp(\sigma ^{2})-1]\exp(2\mu +\sigma ^{2})}\)
So larger \(\mu\) and larger \(\sigma\) make for larger variance, but the impact of \(\sigma\) is greater (because of the squaring)

Model 3 (quap)

m4.3 <- 
  quap(
    data = HowellAdults,
    alist(
      height ~ dnorm(mu, sigma),
      mu    <- alpha_0 + beta_1 * (weight - mean(weight)),
      alpha_0 ~ dnorm(178, 20),
      beta_1 ~ dlnorm(0, 1),
      sigma  ~ dunif(0, 50)
    )
  )

Posterior Explorations

Anything that depends on parameters has a posterior distribution.

Post4.3 <- m4.3 %>% extract.samples(1e4) %>%
  mutate(beta_0 = alpha_0 - beta_1 * mean(HowellAdults$weight))
( 
  gf_dens( ~ alpha_0 , data = Post4.3) |
  gf_blank(0 ~ 0)  %>% gf_theme(theme_void())
) /
  (
    gf_density2d(beta_1 ~ alpha_0, data = Post4.3) |
      gf_dens(beta_1 ~ ., data = Post4.3) 
  )

library(GGally)
## Registered S3 method overwritten by 'GGally':
##   method from   
##   +.gg   ggplot2
## 
## Attaching package: 'GGally'
## The following object is masked from 'package:pander':
## 
##     wrap
ggpairs(Post4.3, upper = list(continuous = "density"))

Posterior Lines

gf_point(height ~ weight, data = HowellAdults) %>%
  gf_abline(
    slope = ~ beta_1, intercept = ~ beta_0, 
    color = "steelblue", alpha = 0.1,
    data = Post4.3 %>% head(100))


Model 4: Height depends on weight and sex

Model 4 (Math)

\[\begin{align*} h_i & \sim \sf{Norm}(\mu_i, \sigma) \\ \mu_i & = \alpha_0 + \beta_1 \cdot (weight_i - \overline{weight}) + \beta_2 * male \\ \alpha_0 & \sim \sf{Norm}(178, 20) \\ \beta_1 & \sim \sf{LogNorm}(0, 1) \\ \beta_2 & \sim \sf{Norm}(0, 10) \\ \sigma & \sim \sf{Unif}(0, 50) \\ \end{align*}\]

Model 4 (quap)

Note: The male variable is either 0 (for females) or 1 (for males).

m4.4 <- 
  quap(
    data = HowellAdults,
    alist(
      height ~ dnorm(mu, sigma),
      mu    <- alpha_0 + beta_1 * (weight - mean(weight)) + beta_2 * male,
      alpha_0 ~ dnorm(178, 20),
      beta_1 ~ dlnorm(0, 1),
      beta_2 ~ dnorm(0, 10),
      sigma  ~ dunif(0, 50)
    )
  )
Post4.4 <- extract.samples(m4.4, n = 1e3) %>%
  mutate(beta_0 = alpha_0 - beta_1 * mean_weight)
precis(m4.4)
##                mean         sd        5.5%       94.5%
## alpha_0 151.5613770 0.33723981 151.0224027 152.1003514
## beta_1    0.6407607 0.04124548   0.5748424   0.7066789
## beta_2    6.4840091 0.53268348   5.6326780   7.3353402
## sigma     4.2537758 0.16031290   3.9975649   4.5099868
ggpairs(Post4.4, upper = list(continuous = "density"))

gf_point(height ~ weight, color = ~male, data = HowellAdults) %>%
  gf_abline(slope = ~ beta_1, intercept = ~ beta_0, alpha = 0.1,
            color = ~0,
            data = Post4.4 %>% head(100)) %>%
  gf_abline(slope = ~ beta_1, intercept = ~ beta_0 + beta_2, alpha = 0.1,
            color = ~1,
            data = Post4.4 %>% head(100))

last_plot() %>%
  gf_facet_grid( ~ male)

More Posterior Predictions

In this section, we want to refine our approach of plotting 100 posterior lines on the plot in two ways:

  1. Instead of 100 individual lines, we would like to the upper and lower bound for some fraction of the lines (at each weight).

  2. In addition to plotting the lines, which represent average height, we would like to see a range of plausible values for individual heights (at each weight).

Before dealing with each of these at each weight, we’ll start by looking at just one weight and then figure out how to scale up.

Average and individual heights for people who weigh 50 kg

What does our model say about the heights of people who weigh 50 kg?

  • On average their height should be \(\mu = \alpha_0 + \beta_1 (50 - \overline{weight})\)

  • An individual height should be a draw from a \(\sf{Norm}(\alpha_0 + \beta_1 (50 - \overline{weight}), \sigma)\)-distribution.

Keep two things in mind

  1. We don’t have values for the parameters (\(\alpha_0\), \(\beta_1\), and \(\sigma\)), we have (posterior) distributions. So we will have distributions as answers to these new questions as well. We may choose to summarize the distributions with a number (perhaps the mean or the median), or a range (an HDI), but these are fundamentally distributions.

  2. The posterior distribution is what the model says (based on the data, the likelihood, and the priors). That doesn’t make it true. So in addition to figuring out what a models says, we will want to investigate whether the model should be believed in the first place. Part of that can be determined by comparing what the model says to what we know or think we know, including by comparing things to the data used to fit the model. (But performing well on the data used to fit the model is a very low bar – and fitting very well can actually be a bad thing. More on this later.)

OK. Let’s compute those posterior distributions. We’ll do this by adding additional columns to our posterior samples.

n <- nrow(Post4.3)
Post4.3 <-
  Post4.3 %>%
  mutate(
    mean50 = beta_0 + beta_1 * 50,
    individual50 = rnorm(n, mean50, sigma)
  )

Post4.3 %>%
  gf_dens( ~ mean50, color = ~ "mean50") %>%
  gf_dens( ~ individual50, color = ~ "individual50") %>%
  gf_labs(x = "height | weight = 50")

Post4.3 %>% mean_hdi(mean50)
##     mean50   .lower  .upper .width .point .interval
## 1 159.1285 158.4729 159.821   0.95   mean       hdi
Post4.3 %>% mean_hdi(individual50)
##   individual50  .lower   .upper .width .point .interval
## 1     159.0531 148.564 168.4919   0.95   mean       hdi
gf_point(height ~ weight, data = HowellAdults) %>%
  gf_pointrange(individual50 + .lower + .upper ~ 50, 
                color = ~ "individual", size = 1.2,
                data = Post4.3 %>% mean_hdi(individual50)) %>%
  gf_pointrange(mean50 + .lower + .upper ~ 50, 
                color = ~ "average", size = 0.4,
                data = Post4.3 %>% mean_hdi(mean50)) 

As we see (and as makes sense), the model makes much more precise estimates about the average than about individuals.

Now we would like to do this same thing for weights all across the range of weights in our data.

Scaling up – doing it ourselves

First, let’s write a couple of functions that take posterior samples and a weight and produce an HDI for the mean or individual height.

avg_height <- function(post, weight, prob = 0.95) {
  post %>%
    mutate(mean_height = beta_0 + beta_1 * weight) %>%
    mean_hdi(mean_height, .width = prob) %>%
    mutate(weight = weight) 
}

ind_height <- function(post, weight, prob = 0.95) {
  n <- nrow(post)
  post %>%
    mutate(ind_height = rnorm(n, beta_0 + beta_1 * weight, sigma)) %>%
    mean_hdi(ind_height, .width = prob) %>%
    mutate(weight = weight)
}

avg_height(Post4.3, 50)
##   mean_height   .lower  .upper .width .point .interval weight
## 1    159.1285 158.4729 159.821   0.95   mean       hdi     50
ind_height(Post4.3, 50)
##   ind_height   .lower   .upper .width .point .interval weight
## 1   159.2391 149.1294 169.1683   0.95   mean       hdi     50
avg_height(Post4.3, 40)
##   mean_height   .lower   .upper .width .point .interval weight
## 1    150.0943 149.4073 150.7312   0.95   mean       hdi     40
ind_height(Post4.3, 40)
##   ind_height   .lower   .upper .width .point .interval weight
## 1    150.124 139.8024 159.9972   0.95   mean       hdi     40

Now we want to apply these function to a lot of different weights.

Avg <- map_dfr(seq(30, 65, by = 1), ~ avg_height(Post4.3, .x))
Ind <- map_dfr(seq(30, 65, by = 1), ~ ind_height(Post4.3, .x))
Avg %>% reactable::reactable()
Ind %>% reactable::reactable()

And finally, we can plot the results

gf_point(height ~ weight, data = HowellAdults) %>%
  gf_line(.lower ~ weight, data = Avg, color = ~"avg") %>%
  gf_line(.upper ~ weight, data = Avg, color = ~"avg") %>%
  gf_line(.lower ~ weight, data = Ind, color = ~"ind") %>%
  gf_line(.upper ~ weight, data = Ind, color = ~"ind") 

gf_ribbon(.lower + .upper ~ weight, data = Ind, fill = ~"ind") %>%
  gf_ribbon(.lower + .upper ~ weight, data = Avg, fill = ~"avg") %>%
  gf_point(height ~ weight, data = HowellAdults, inherit = FALSE) 

Notice that most but not all of the dots fall in the blue band. That’s a good sign. The model predicts that 95% of heights will fall in the blue region.

This is a useful exercise for understanding how the posterior sampling is working, and in case you ever need to do something special, for which there is not pre-made tool.

But creating posterior distributions of averages and simulated individuals is very common, so we’d like a simpler way to get those.

Questions to ask about/with a model

We will continue to address all of these as we go along.