Posterior (and Prior) Sampling

Reminder
Two KidsFeet models
- Length
- Does length depend on width?
Sampling from the Posterior
- Refining our grid
Summarizing the Posterior Distribution
- “Point Estimates”
- Highest Posterior Density Intervals (HPDI)
quap()
Prior sampling

Reminder

\[ \LARGE \mbox{posterior} \propto \mbox{prior} \cdot \mbox{liklihood} \]

Two KidsFeet models

Length

library(purrr)
LengthGrid <-
  expand_grid(
    mu = seq(15, 30, length.out=101),
    sigma = seq(0, 20, length.out = 101)
  )  %>% 
  filter(sigma > 0) %>%
  mutate(
    logprior = 
      dnorm(mu, mean = 25, sd=10, log = TRUE) + 
      dunif(sigma, 0, 20, log = TRUE),
    loglikelihood = 
      map2_dbl(
        mu, sigma,
        ~ sum(dnorm(KidsFeet$length, mean =  .x, sd = .y, log= TRUE))
      ),
    logposterior = logprior + loglikelihood,
    posterior = exp(logposterior) / sum(exp(logposterior))
  ) 

head(LengthGrid)

## # A tibble: 6 x 6
##      mu sigma logprior loglikelihood logposterior posterior
##   <dbl> <dbl>    <dbl>         <dbl>        <dbl>     <dbl>
## 1    15   0.2    -6.72       -46885.      -46892.         0
## 2    15   0.4    -6.72       -11728.      -11735.         0
## 3    15   0.6    -6.72        -5228.       -5235.         0
## 4    15   0.8    -6.72        -2959.       -2966.         0
## 5    15   1      -6.72        -1912.       -1919.         0
## 6    15   1.2    -6.72        -1346.       -1353.         0

LengthGrid %>%
  gf_contour_filled( posterior ~ mu + sigma) %>%
  gf_contour( posterior ~ mu + sigma, color = "white")

Zooming in

LengthGrid %>%
  gf_contour_filled( posterior ~ mu + sigma) %>%
  gf_contour( posterior ~ mu + sigma, color = "white") %>%
  gf_lims(x = c(24, 25.5), y = c(0, 3))

## Warning: Removed 9935 rows containing non-finite values (stat_contour_filled).

## Warning: Removed 9935 rows containing non-finite values (stat_contour).

Remaking the Grid

LengthGrid <-
  expand_grid(
    mu = seq(23, 26, length.out=101),
    sigma = seq(0, 3, length.out = 101)
  )  %>% 
  filter(sigma > 0) %>%
  mutate(
    logprior = 
      dnorm(mu, mean = 25, sd=10, log = TRUE) + 
      dunif(sigma, 0, 20, log = TRUE),
    loglikelihood = 
      map2_dbl(
        mu, sigma,
        ~ sum(dnorm(KidsFeet$length, mean =  .x, sd = .y, log= TRUE))
      ),
    logposterior = logprior + loglikelihood,
    posterior = exp(logposterior) / sum(exp(logposterior))
  ) 

head(LengthGrid)

## # A tibble: 6 x 6
##      mu sigma logprior loglikelihood logposterior posterior
##   <dbl> <dbl>    <dbl>         <dbl>        <dbl>     <dbl>
## 1    23  0.03    -6.24      -100877.     -100883.         0
## 2    23  0.06    -6.24       -25171.      -25177.         0
## 3    23  0.09    -6.24       -11162.      -11168.         0
## 4    23  0.12    -6.24        -6264.       -6270.         0
## 5    23  0.15    -6.24        -4001.       -4007.         0
## 6    23  0.18    -6.24        -2774.       -2780.         0

LengthGrid %>%
  gf_contour_filled( posterior ~ mu + sigma) %>%
  gf_contour( posterior ~ mu + sigma, color = "white")

Does length depend on width?

Data: KidsFeet$length
Likelihood: $h \sim \operatorname{Norm}(\mu, \sigma)$
$\mu$ depends on width: $\mu = \beta_0 + \beta_1 \cdot \mbox{width}$
- Note: we could use any function here, but for now we will use a linear function.
Prior – we need priors for 3 parameters: $\beta_0$, $\beta_1$, $\sigma$
- For this simple example, we will use uniform priors for each, but these are not particularly good priors
- We still need specify the boundaries for the uniform distributions; we’ll start with some pretty wide uniform priors.

FootGrid <-
  expand_grid(
    beta_0 = seq(-20, 20, length.out = 101),
    beta_1 = seq(-1, 10, length.out = 101),
    sigma = seq(0, 10, length.out = 101)
  )  %>% 
  filter(sigma > 0) %>%
  mutate(
    logprior = 0 + 0 + 0,   # uniform priors
    loglikelihood = 
      pmap_dbl( 
        list(beta_0, beta_1, sigma),
        ~ sum(dnorm(KidsFeet$length, 
                    mean = ..1 + ..2 * KidsFeet$width, 
                    sd = ..3,
                    log = TRUE),
              na.rm = TRUE)
      ),
    logposterior = logprior + loglikelihood,
    posterior = exp(logposterior)
  )

head(FootGrid)

## # A tibble: 6 x 7
##   beta_0 beta_1 sigma logprior loglikelihood logposterior posterior
##    <dbl>  <dbl> <dbl>    <dbl>         <dbl>        <dbl>     <dbl>
## 1    -20     -1   0.1        0     -5631792.    -5631792.         0
## 2    -20     -1   0.2        0     -1407934.    -1407934.         0
## 3    -20     -1   0.3        0      -625749.     -625749.         0
## 4    -20     -1   0.4        0      -351990.     -351990.         0
## 5    -20     -1   0.5        0      -225283.     -225283.         0
## 6    -20     -1   0.6        0      -156456.     -156456.         0

A plot of the joint posterior is trickier now since we have 3 parameters, so we would need a 4-d plot. So we would like to look at some marginal distributions instead. (And we also want to took at maginals in the previous example.)

Sampling from the Posterior

We could compute marginals from a grid by summing over rows that have the same value for one or more of the paramters. (This is like adding along rows or columns in the freqency tables we looked at a few days ago.)

But we will opt for a different method based on sampling. We do this for two reasons:

It is simpler to implement
It will generalize better to other ways of working with Bayesian models, and we will soon be abnadoning the grid method.

The idea is to generate a large random sample from the posterior distribution. Of course, we want to sample more often where the posterior is larger. We can do this using slice_sample().

FootPosteriorSamples <- 
  FootGrid %>%
  slice_sample(n = 1e4, weight_by = posterior, replace = TRUE)

head(FootPosteriorSamples)

## # A tibble: 6 x 7
##   beta_0 beta_1 sigma logprior loglikelihood logposterior posterior
##    <dbl>  <dbl> <dbl>    <dbl>         <dbl>        <dbl>     <dbl>
## 1   12     1.42     1        0         -55.6        -55.6  7.23e-25
## 2    4     2.3      1        0         -57.3        -57.3  1.26e-25
## 3    8     1.86     1        0         -55.5        -55.5  8.13e-25
## 4    8     1.86     1        0         -55.5        -55.5  8.13e-25
## 5    8     1.86     1        0         -55.5        -55.5  8.13e-25
## 6    9.2   1.75     1        0         -56.2        -56.2  3.93e-25

Marginals for individual parameters:

FootPosteriorSamples %>%
  gf_histogram(~ beta_0, bins = 20) | 
FootPosteriorSamples %>%
  gf_dens(~ beta_1, adjust = 3) |
FootPosteriorSamples %>%
  gf_dens(~ sigma, adjust = 3)

Marginals for pairs of parameters:

FootPosteriorSamples %>% 
  gf_density2d(beta_1 ~ beta_0) |
FootPosteriorSamples %>% 
  gf_density2d(sigma ~ beta_0) |
FootPosteriorSamples %>% 
  gf_density2d(sigma ~ beta_1)

Refining our grid

Let’s refine our grid a bit – zooming in on the region where the posterior density is largest.

FootGrid2 <-
  expand_grid(
    beta_0 = seq(0, 20, length.out = 101),
    beta_1 = seq(0.5, 3, length.out = 101),
    sigma = seq(0.2, 3, length.out = 101)
  )  %>% 
  filter(sigma > 0) %>%
  mutate(
    logprior = 0 + 0 + 0,   # uniform priors
    loglikelihood = 
      pmap_dbl( 
        list(beta_0, beta_1, sigma),
        ~ sum(dnorm(KidsFeet$length, 
                    mean = ..1 + ..2 * KidsFeet$width, 
                    sd = ..3,
                    log = TRUE),
              na.rm = TRUE)
      ),
    logposterior = logprior + loglikelihood,
    posterior = exp(logposterior)
  )

head(FootGrid2)

## # A tibble: 6 x 7
##   beta_0 beta_1 sigma logprior loglikelihood logposterior posterior
##    <dbl>  <dbl> <dbl>    <dbl>         <dbl>        <dbl>     <dbl>
## 1      0    0.5 0.2          0      -200074.     -200074.         0
## 2      0    0.5 0.228        0      -153949.     -153949.         0
## 3      0    0.5 0.256        0      -122115.     -122115.         0
## 4      0    0.5 0.284        0       -99224.      -99224.         0
## 5      0    0.5 0.312        0       -82215.      -82215.         0
## 6      0    0.5 0.34         0       -69233.      -69233.         0

FootPosteriorSamples <- 
  FootGrid2 %>%
  slice_sample(n = 1e4, weight_by = posterior, replace = TRUE)

head(FootPosteriorSamples)

## # A tibble: 6 x 7
##   beta_0 beta_1 sigma logprior loglikelihood logposterior posterior
##    <dbl>  <dbl> <dbl>    <dbl>         <dbl>        <dbl>     <dbl>
## 1    9.4   1.7  1.15         0         -56.0        -56.0  4.68e-25
## 2    9.6   1.68 0.900        0         -55.8        -55.8  5.81e-25
## 3   13.2   1.27 1.04         0         -56.1        -56.1  4.51e-25
## 4    9.6   1.65 1.29         0         -58.4        -58.4  4.24e-26
## 5    7.4   1.95 1.38         0         -59.2        -59.2  1.89e-26
## 6    5.4   2.12 1.29         0         -58.7        -58.7  3.37e-26

Marginals for individual parameters:

FootPosteriorSamples %>%
  gf_histogram(~ beta_0, bins = 20) | 
FootPosteriorSamples %>%
  gf_dens(~ beta_1, adjust = 3) |
FootPosteriorSamples %>%
  gf_dens(~ sigma, adjust = 3)

Marginals for pairs of parameters:

FootPosteriorSamples %>% 
  gf_density2d(beta_1 ~ beta_0) |
FootPosteriorSamples %>% 
  gf_density2d(sigma ~ beta_0) |
FootPosteriorSamples %>% 
  gf_density2d(sigma ~ beta_1)

That gives us a better picture of what the posterior really looks like.

Summarizing the Posterior Distribution

“Point Estimates”

We could summarize the posterior distribution by computing our “best estimate” for each parameter. That best estimate could be chosen use one of several different summaries. The most commonly used are

mean (“average” value in the posterior)
median (half the distribution below, half above)
mode (peak of the distribution)

Here we compute the meidan and the mean.¹

df_stats( ~ beta_0 + beta_1 + sigma, data = FootPosteriorSamples, mean, median)

##   response     mean median
## 1   beta_0 9.813700   9.80
## 2   beta_1 1.657840   1.65
## 3    sigma 1.062126   1.04

But typically we want to convey the uncertainty as well, so…

Highest Posterior Density Intervals (HPDI)

You may be familiar with confidence intervals from previous courses. The nearest equivalent to that in Bayesian inference is the HPDI.

Several packages (pretty much any package that works with bayesian models) provide a way to compute these intervals.

library(rethinking)
HPDI(FootPosteriorSamples$beta_1, prob = 0.95)

## |0.95 0.95| 
## 1.025 2.300

library(tidybayes)
mode_hdi(FootPosteriorSamples, beta_1)

## # A tibble: 1 x 6
##   beta_1 .lower .upper .width .point .interval
##    <dbl>  <dbl>  <dbl>  <dbl> <chr>  <chr>    
## 1   1.64   1.02    2.3   0.95 mode   hdi

gf_dens(~ beta_1, data = FootPosteriorSamples) %>%
  gf_pointrange(
    0 ~ beta_1 + .lower + .upper, 
   data = mode_hdi(FootPosteriorSamples, beta_1)
  )

library(CalvinBayes)
plot_post(FootPosteriorSamples$beta_1, hdi_prob = 0.95)

## $posterior
##        ESS    mean median     mode
## var1 10000 1.65784   1.65 1.644868
## 
## $hdi
##   prob    lo  hi
## 1 0.95 1.025 2.3

library(bayesplot)
mcmc_dens(FootPosteriorSamples, pars = "beta_1")

mcmc_hist(FootPosteriorSamples, pars = "beta_1")

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

quap()

FootQ <-
  quap(
  alist(
    length ~ dnorm(mu, sigma),
    mu    <- beta_0 + beta_1 * width,
    beta_0 ~ dunif(-20, 20),
    beta_1 ~ dunif(-1, 20),
    sigma  ~ dunif(0.1, 20)
  ),
  data = KidsFeet
)

FootQPostSamples <- FootQ %>% extract.samples(n = 1e4) 
head(FootQPostSamples)

##      beta_0   beta_1     sigma
## 1 13.270007 1.261762 1.0792452
## 2 16.144995 0.969353 0.8815271
## 3  8.423935 1.821491 0.8486466
## 4 12.348382 1.364799 0.9329696
## 5 11.573704 1.479493 1.0320564
## 6  7.827031 1.893130 0.8990027

mode_hdi(FootQPostSamples)

##     beta_0 beta_0.lower beta_0.upper   beta_1 beta_1.lower beta_1.upper
## 1 10.03133     4.266992     15.35381 1.630204     1.022051     2.254457
##      sigma sigma.lower sigma.upper .width .point .interval
## 1 1.011218   0.7788652    1.223042   0.95   mode       hdi

plot_post(FootQPostSamples$beta_1) %>% class()

## [1] "list"

gf_dens(~beta_1, data = FootQPostSamples)

gf_point(length ~ width, data = KidsFeet) %>%
  gf_abline(
    slope = ~ beta_1, intercept = ~beta_0, 
    data = FootQPostSamples %>% head(100),
    alpha = 0.2, color = "steelblue")

Prior sampling

We can sample from priors too.

Computing the mode is a bit trickier. We are interested in estimating the location of the peak of the distribution, but simply using the value that appears most often in our sample isn’t a very good way to estimate this. We’ll see some functions that estimate the mode (and do other things at the same time) in just a moment.↩︎