Reminder

\[ \LARGE \mbox{posterior} \propto \mbox{prior} \cdot \mbox{liklihood} \]


From last time

Sampling the globe

  • Prior: \(p \sim \operatorname{Triangle}(0, 1, 0.7)\) [ p = dtriangle(p, 0, 1, 0.7)]

  • Data: 6 out of 9 water

  • Likelihood: \(p^6 \cdot (1-p)^3\)

library(triangle)
WaterGrid2 <-
  expand_grid(
    p = seq(0, 1, length.out = 501)
  ) %>%
  mutate(
    prior = dtriangle(p, 0, 1, 0.7),
    likelihood = p^6 * (1-p)^3,
    likelihood = likelihood / sum(likelihood) / 0.002,
    posterior = prior * likelihood,
    posterior = posterior / sum(posterior) / 0.002
  )
gf_line( prior ~ p, data = WaterGrid2, color = ~ "prior") %>% 
gf_line( likelihood ~ p, data = WaterGrid2, color = ~ "likelihood", size = 1.7) %>%
gf_line( posterior ~ p, data = WaterGrid2, color = ~ "posterior")

Working on the log scale

Let’s redo this example to work on the log scale:

\[ \Large \begin{align*} \mbox{posterior} &= \mbox{constant} \cdot \mbox{prior} \cdot \mbox{likelihood}\\ \log(\mbox{posterior}) &= \log(\mbox{constant}) + \log(\mbox{prior}) + \log(\mbox{likelihood}) \end{align*} \] This is important because likelihoods can be very small when we have a lot of data. (Any particular data set is extremely unlikely.) Working on the log scale avoid underflow and precision issues in computational methods by keeping numbers in a more manageable range.

library(triangle)
WaterGrid3 <-
  expand_grid(
    p = seq(0, 1, length.out = 501)
  ) %>%
  mutate(
    logprior = log(dtriangle(p, 0, 1, 0.7)),
    loglikelihood = 6 * log(p) + 3 * log(1-p),      #  log(p^6 * (1-p)^3),
    logposterior = logprior + loglikelihood,
    posterior = exp(logposterior),
    posterior = posterior / sum(posterior)    # rescale
    )
gf_line( posterior ~ p, data = WaterGrid3, color = ~ "posterior")

Variations on the Theme – Using Functions in R

If we have code that we want to run several times with slight changes, it is better to create a function than to copy and paste. Let’s do that without previous example. We would like to be able to change

  • the resolution (how many points in the grid)
  • the prior
  • the data (which will change the likelihood)
make_water_grid <- function(water, total, prior, resolution = 500, ...) {
 
    expand_grid(
      p = seq(0, 1, length.out = resolution + 1)
    ) %>%
    mutate(
      logprior = log(prior(p, ...)),
      loglikelihood = water * log(p) + (total - water) * log(1-p),
      logposterior = logprior + loglikelihood,
      posterior = exp(logposterior),
      posterior = posterior / sum(posterior)    # rescale
    )
}

Example uses:

make_water_grid(water = 250, total = 300, resolution = 100, 
                prior = dtriangle, a = 0, b= 1, c = 0.7) %>%
  gf_line(posterior ~ p) |
make_water_grid(water = 250, total = 300, resolution = 100, prior = dunif) %>%
  gf_line(posterior ~ p)

make_water_grid(water = 6, total = 9, resolution = 100, 
                prior = dtriangle, a = 0, b= 1, c = 0.7) %>%
  gf_line(posterior ~ p) |
make_water_grid(water = 6, total = 9, resolution = 100, prior = dunif) %>%
  gf_line(posterior ~ p)

Models with more than one parameter

How long are Kids Feet?

  • Data: KidsFeet$length

  • Likelihood: \(h \sim \operatorname{Norm}(\mu, \sigma)\)

  • Prior

    • \(\mu \sim \operatorname{Norm}(25, 10)\)
    • \(\sigma \sim \operatorname{Unif}(0, 20)\)
library(purrr)
LengthGrid <-
  expand_grid(
    mu = seq(15, 30, length.out=101),
    sigma = seq(0, 20, length.out = 101)
  )  %>% 
  filter(sigma > 0) %>%
  mutate(
    logprior = 
      dnorm(mu, mean = 25, sd=10, log = TRUE) + 
      dunif(sigma, 0, 20, log = TRUE),
    loglikelihood = 
      map2_dbl(
        mu, sigma,
        ~ sum(dnorm(KidsFeet$length, mean =  .x, sd = .y, log= TRUE))
      ),
    logposterior = logprior + loglikelihood,
    posterior = exp(logposterior) / sum(exp(logposterior))
  ) 

LengthGrid %>%
  gf_contour( posterior ~ mu + sigma)

Note on map2_dbl():

  • this is important here because we want to use all the length values in the data for each combination of \(\mu\) and \(\sigma\) in our grid.

  • map2_dbl() let’s us tell R which are all and which are each.

Improvement

Now that we have created our plot, we see that the interesting stuff is happening on a much narrower window. Let’s focus our efforts there.

library(purrr)
LengthGrid2 <-
  expand_grid(
    mu = seq(23, 27, length.out = 201),
    sigma = seq(0, 3, length.out = 201)
  )  %>% 
  filter(sigma > 0) %>%
  mutate(
    logprior = 
      dnorm(mu, mean = 25, sd=10, log = TRUE) + 
      dunif(sigma, 0, 20, log = TRUE),
    loglikelihood = 
      map2_dbl(
        mu, sigma,
        ~ sum(dnorm(KidsFeet$length, mean =  .x, sd = .y, log= TRUE))
      ),
    logposterior = logprior + loglikelihood,
    posterior = exp(logposterior) / sum(exp(logposterior))
  ) 

LengthGrid2 %>%
  gf_contour_filled( posterior ~ mu + sigma) %>%
  gf_contour( posterior ~ mu + sigma, color = "white")

For another day

  • How do we get the marginals of the posterior (posteriors for \(\mu\) and \(\sigma\))

  • What else can we do with such a posterior?