Goals for today

  1. Fit models illustrating multicollinearity.

  2. Learn how to fit models with Stan (via rethinking::ulam()).

  3. Learn how to use bayesplot to create posterior plots.

Terrible Things

We have seen how to use both categorical and numerical predictors in our model, and we can add as many of them as we like. We have seen (divorce rate example; SAT example) that sometimes we cannot understand the realtionship between our response and a predictor of interest without including additional predictors.

So a natural idea is to always include lots of predictors. Unfortunately…

This chapter [ie chapter 6] and the next are both about terrible things that can happen when we simply add variables to a regression, without a clear idea of a causal model. [@mcelreathStatisticalRethinkingBayesian2020, pp. 161--162]

We’ll learn about one of these “terrible things” in a moment, but first a few words about fitting Bayesian models using Stan.

Stan and its ecosystem

Stan is a programming language for creating and fitting Bayesian models using a fancy version of the HMC algorithm. Stan is good at what it does, but it doesn’t do much else. To use Stan, we must translate our mathematical description of the model in to the Stan programming language (which is a bit like C/C++, but not exactly). Stan then converts this description into C++, compiles it, and runs the resulting program to produce posterior samples.

Schematically, it looks something like this:

\[ \begin{array}{c} \mbox{model} \\ \downarrow \\ \mbox{Stan code} \\ \downarrow \\ \mbox{C++ code} \\ \downarrow \\ \mbox{compiled C++ code} \\ \downarrow \\ \mbox{posterior samples} \end{array} \]

To use Stan in its native form would require us to learn another programming language, one that is very good at what it does, but very limited in scope. Preprosessing the data, summarizing and visualizing the results – these are better done in some other language (like R).

There are several ways to make the connection between the Stan and R.

  1. The rstan package provides a way to execute Stan code and return an R object with information about the model and its posterior samples.

    The Stan code can be read from an external file or passed in as a string. Both of these are a little bit clunky (and still require you to write in native Stan), so we won’t be doing this. But it is good to know a bit about this package, because several other solutions are built on top of it.

    The function used to compile and sample is called stan() and the object it produces is a “stanfit” object.

  2. RMarkdown documents can have Stan chunks, just like they have R chunks.

    There is a way to pass data into the chunk and to retreive the posterior samples from the chunk. The code in the chunk is passed to rstan for compiling and executing. This is a big improvement over writing Stan code in external files or in R strings. We mostly won’t do that so we can avoid working directly in Stan, but it is a great way to go if you want full control over Stan models.

  3. The rethinking package provides a function called ulam() that works very much like quap(), but uses Stan (via rstan) rather than quadratic approximation.

    This has the big advantage of familiarity – there isn’t much more we need to learn since we already know about quap(). ulam() doesn’t give you access to everything Stan can do, but it covers a lot – including everything our author does in his book.

    ulam() mimicks the way we describe models mathematically, and it forces you to express every bit of the model explicitly. Those are both good things when you are learning, but you aren’t likely to see this function used anywhere else.

    The result of running ulam() is an “ulam” object. You can extract the “stanfit” object from the “ulam” object u using stanfit(u) or u@stanfit.

  4. The brms package also provides a way to describe models that are then passed to rstan for compilation and sampling.

    The description of the models in brms looks quite different. The model description is much terser, so you need to learn its way of coding up a model. The posterior descriptions are a bit more verbose, and look a bit more like native Stan code and bit less like R. This is a package that is used quite extensively by people fitting a wide range of models.

    The result of running brm() is an “brmsfit” object. You can extract the “stanfit” object from a “brmsfit” object b using stanfit(b) or b$fit.

  1. The bayesplot package can create many useful plots from a stanfit object or its posterior samples (or from MCMC samples created in other ways).

We will start off using ulam(), but may eventually learn brm() as well. So our workflow looks something like this.

\[ \begin{array}{ccccc} && \mbox{model} \\ & \swarrow && \searrow \\ \texttt{rethinking::ulam()} &&&& \texttt{brms::brm()} \\ \downarrow &&&& \downarrow \\ (\texttt{rstan::stan()}) &&&& (\texttt{rstan::stan()}) \\ \downarrow &&&& \downarrow \\ \mbox{ulam object} &&&& \mbox{brmsfit object} \\ \downarrow & \searrow && \swarrow & \downarrow \\ \downarrow &&\texttt{stanfit(object)} && \downarrow \\ \downarrow & &\downarrow& & \downarrow \\ \texttt{extract.samples()} &&\texttt{as.data.frame() }&& \texttt{posterior_samples()} \end{array} \]

  • The parenthesis indicate that we don’t need to call stan() directly, but it will be used, and some of the arguments from ulam() and brm() are passed along to stan() to control how Stan is used.

  • The important thing to notice here is that we have two ways to get started, and three ways to end. The ending has been illustrated with the commands used to extract posterior samples as a data frame (or a list in the case of ulam). I may add some functions (like stanfit()) to CalvinBayes to help smooth over some of the differences, but it will still be good to keep track of whether you have an ulam object, a brmsfit object, a stanfit object, or some flavor of posterior samples created from (one of these versions of) our model.

Multicollinearity

From our text:

Multicollinearity means a very strong association between two or more predictor variables. The raw correlation isn’t what matters. Rather what matters is the association, conditional on the other variables in the model. The consequence of multicollinearity is that the posterior distribution will seem to suggest that none of the variables is reliably associated with the outcome, een if all the variables are in reality strongly associated with the outcome.

Legs

Let’s begin with a simulated data set.

Let’s simulate some leg data.

set.seed(909)
n <- 100

Legs <- 
  tibble(height   = rnorm(n, mean = 10, sd = 2),
         leg_prop = runif(n, min = 0.4, max = 0.5)) %>% 
  mutate(leg_left  = leg_prop * height + rnorm(n, mean = 0, sd = 0.02),
         leg_right = leg_prop * height + rnorm(n, mean = 0, sd = 0.02))

As you might expect in real-life data, the leg_left and leg_right columns are highly correlated.

Legs %>%
  select(leg_left:leg_right) %>%
  cor() %>%
  round(digits = 4)
##           leg_left leg_right
## leg_left    1.0000    0.9997
## leg_right   0.9997    1.0000

Have you ever even seen a \(\rho = .9997\) correlation, before? Here it is in a plot.

Legs %>%
  ggplot(aes(x = leg_left, y = leg_right)) +
  geom_point(alpha = 1/2, color = "forestgreen")

Model 6.1

Here’s our attempt to predict height with both legs.

m6.1 <- 
  quap(
    data = Legs, 
    alist(
      height ~ dnorm(mu, sigma),
      mu <- b0 + bl * leg_left + br * leg_right,
      b0 ~ dnorm(10, 100),      # these are pretty silly, vague priors; don't mimic!
      bl ~ dnorm(2, 10),
      br ~ dnorm(2, 10),
      sigma ~ dexp(1)
    )
  )

We can use Stan (an implementation of HMC) to fit this model as well. We just

  • Change quap() to ulam();
  • Add a few arguments to give instructions to the HMC sampler (how many chains, iterations, etc.)

There are a few other bomus features we ill look at eventually as well. Let’s give it a go.

u6.1 <- 
  ulam(
    data = Legs, 
    alist(
      height ~ dnorm(mu, sigma),
      mu <- b0 + bl * leg_left + br * leg_right,
      b0 ~ dnorm(10, 100),      # these are pretty silly, vague priors; don't mimic!
      bl ~ dnorm(2, 10),
      br ~ dnorm(2, 10),
      sigma ~ dexp(1)
    ),
    chains = 4,         # run from 4 different starting points
    iter = 2000,        # use 2000 iterations per chain
    warmup = 1000,      # use 1000 iterations to tune parameters (per chain)
    cores = 4,          # spread computation over 4 cores
    seed = 6,           # set seed used for randomization in HMC
    file = "fits/u6.1"  # load model from a file if it exists; create file if it does not
  )

Saving models to a file

Note the file argument. This will save lots of time when knitting your file repeatedly. Instead of fitting the model each time, if it has been fit before, it will simply reload the previous fit. But be sure to delete the file if you change your model. As far as I can tell, ulam() is not clever enough to do this for you when the model changes, even though it knows the model description.

If you ever want to see what model you really have, just type the name of the model.

u6.1
## Hamiltonian Monte Carlo approximation
## 4000 samples from 4 chains
## 
## Sampling durations (seconds):
##         warmup sample total
## chain:1   5.03   6.96 11.99
## chain:2   5.56   7.14 12.70
## chain:3   5.67   7.03 12.70
## chain:4   5.89   7.04 12.93
## 
## Formula:
## height ~ dnorm(mu, sigma)
## mu <- b0 + bl * leg_left + br * leg_right
## b0 ~ dnorm(10, 100)
## bl ~ dnorm(2, 10)
## br ~ dnorm(2, 10)
## sigma ~ dexp(1)

Inspecting the model

Let’s inspect the damage.

precis(m6.1)
##            mean         sd       5.5%     94.5%
## b0    0.9812791 0.28395540  0.5274635 1.4350947
## bl    0.2118585 2.52703706 -3.8268348 4.2505518
## br    1.7836774 2.53125061 -2.2617500 5.8291047
## sigma 0.6171026 0.04343427  0.5476862 0.6865189
precis(u6.1)
##            mean         sd       5.5%     94.5%    n_eff    Rhat4
## b0    0.9686079 0.29417550  0.4978517 1.4409229 1913.726 1.000958
## bl    0.2114645 2.52527192 -3.7527993 4.2509907 1286.363 1.008659
## br    1.7867572 2.53122802 -2.2684078 5.7869395 1285.999 1.008721
## sigma 0.6326543 0.04590486  0.5636909 0.7105869 1911.022 1.000515

We have a couple new columns in the output.

  • n_eff is the effective sample size. We collected 500 samples per chain, so 2000 total. These effective sample sizes suggest that we are sampling quite efficiently.

  • Rhat4 should approach 1 (from above) if the algorithm is converging. Anything greater than 1.1 is a red flag. We’d prefer to be below 1.05. These look good. [More on the metric later.]

So our HMC algorithm appears to be working and agreeing with the quadratic approximation. But…

That ‘sd’ column isn’t looking too good. But it’s easy to miss that, which is why McElreath suggested “a graphical view of the [output] is more useful because it displays the posterior means and [intervals] in a way that allows us with a glance to see that something has gone wrong here” (p. 164).

plot(coeftab(m6.1, u6.1))

As we move toward using HMC, we can take advantage of the bayesplot package, which provides many nice visualizations. Here’s the bayesplot version of the previous plot (just for our HMC model):

Different random legs

In the middle of page 164, McElreath suggested we might try this again with different seeds. Our first step will be to make a custom function that will simulate new data of the same form as above and then immediately fit a model based on u6.1 to those new data. To speed up the process, we’ll use the stan() function to avoid recompiling the model. Our custom function, sim_and_fit(), will take two arguments. The seed argument will allow us to keep the results reproducible by setting a seed for the data simulation. The n argument will allow us, should we wish, to change the size of our simulated sample.

sim_and_fit <- function(seed, n = 100, ...) {
  
  set.seed(seed)
  
  # simulate the new data
  NewLegs <- 
    tibble(height   = rnorm(n, mean = 10, sd = 2),
           leg_prop = runif(n, min = 0.4, max = 0.5)) %>% 
    mutate(leg_left  = leg_prop * height + rnorm(n, mean = 0, sd = 0.02),
           leg_right = leg_prop * height + rnorm(n, mean = 0, sd = 0.02))
  
  stan(fit = stanfit(u6.1), data = NewLegs, seed = seed, ...) 
}

Now use sim_and_fit() to make our simulations which correspond to seed values 1:4. By nesting the seed values and the sim_and_fit() function within purrr::map(), the results will be saved within our tibble, Sims. Since this takes a while to run, we will save the results. The next time we come this way, if the file exists, we will just load the results rather than recompute them.

filename <- 'sims/sim_6.1.1.Rds'
if (file.exists(filename)) {
  Sims <- readRDS(filename)
} else {
  Sims <-
    tibble(seed = 1:4) %>% 
    mutate(post = purrr::map(seed, ~sim_and_fit(.) %>% 
                             posterior_samples()))
  saveRDS(Sims, file = filename)
}

Take a look at what we did.

head(Sims)
## # A tibble: 4 x 2
##    seed post                
##   <int> <list>              
## 1     1 <df[,5] [4,000 × 5]>
## 2     2 <df[,5] [4,000 × 5]>
## 3     3 <df[,5] [4,000 × 5]>
## 4     4 <df[,5] [4,000 × 5]>

We have a tibble with four rows and two columns. Hopefully it’s unsurprising the first column shows our four seed values. The second column, post, might look odd. If you look back up at the code we used to make Sims, you’ll notice we fed the results from sim_and_fit() into the brms::posterior_samples() function. Each model object contained 4,000 poster draws. There are four parameters in the model and, by brms default, the posterior_samples() function also returns the lp__, which we generally ignore in this project. Anyway, that’s why you see each of the four rows of post containing <data.frame [4,000 × 5]>. Each cell contains an entire \(4,000 × 5\) data frame–the results from posterior_samples(). When you have data frames nested within data frames, this is called a nested data structure. To unpack the contents of those four nested data frames, you simply call unnest(Sims, post) or Sims %>% unnest(post). In the next code block, we’ll do that within the context of a larger work low designed to plot the results of each in a faceted coefficient plot. For data of this kind of structure, the tidybayes::stat_pointinterval() function will be particularly useful.

library(tidybayes)

gf_pointinterval <-
  layer_factory(
    geom = tidybayes::GeomPointinterval, stat = ggdist::StatPointinterval,
    aes_form = y ~ x)

Sims %>% 
  unnest(post) %>% 
  pivot_longer(b0:sigma) %>% 
  mutate(seed = paste("seed", seed)) %>% 
  gf_pointinterval(name ~ value, .width = 0.95, color = "forestgreen") %>%
  gf_facet_wrap(~ seed, ncol = 1) %>%
  gf_labs(x = "posterior", y = NULL) %>%
  gf_theme(axis.text.y = element_text(hjust = 0),
        panel.border = element_rect(color = "black", fill = "transparent"),
        panel.grid.minor = element_blank(),
        strip.text = element_text(hjust = 0)) 

Though the results varied across iterations, the overall pattern was massive uncertainty in the two \(\beta\) parameters. You may be wondering,

Did the posterior approximation work correctly?

It did work correctly, and the posterior distribution here is the right answer to the question we asked. The problem is the question. Recall that a multiple linear regression answers the question: What is the value of knowing each predictor, after already knowing all of the other predictors? So in this case, the question becomes: What is the value of knowing each leg’s length, after already knowing the other leg’s length?

The answer to this weird question is equally weird, but perfectly logical. (p. 164, emphasis in the original)

To further answer this weird question, we might start making the panels from Figure 6.2. Starting with the left panel, this is perhaps the simplest way to plot the bivariate posterior of our two predictor coefficients.

mcmc_pairs(stanfit(u6.1), pars = c("bl", "br"))

We can also create posterior samples and use those to make plots.

Post <- u6.1 %>% extract.samples() %>% as.data.frame()
# or Post <- u6.1 %>% stanfit() %>% as.data.frame()
  
Post %>% 
  gf_point(br ~ bl, color = "forestgreen", alpha = 1/10, size = 1/2)

While we’re at it, you can make a similar plot with the mcmc_scatter() function [see @gabryPlottingMCMCDraws2019, [*Plotting MCMC draws using the bayesplot package*](https://CRAN.R-project.org/package=bayesplot/vignettes/plotting-mcmc-draws.html)].

Post %>% 
  mcmc_scatter(pars = c("bl", "br"),
               size = 1/2, 
               alpha = 1/10)

But wow, those coefficients look about as highly correlated as the predictors, just with the reversed sign.

Post %>% 
  select(bl:br) %>% 
  cor()
##            bl         br
## bl  1.0000000 -0.9996951
## br -0.9996951  1.0000000

On page 165, McElreath clarified that this model may as well be

\[\begin{align*} y_i & \sim {\sf Norm}(\mu_i, \sigma) \\ \mu_i & = \alpha + (\beta_1 + \beta_2) x_i. \end{align*}\]

So let’s look at the posterior for bl + br:

library(tidybayes)

gf_halfeye <- 
  layer_factory(
    stat = ggdist:::StatSampleSlabinterval, geom = tidybayes::GeomSlabinterval,
    aes_form = y ~ x)

Post %>% 
  gf_halfeye(
    0 ~ (bl + br), 
    point_interval = mean_hdi, 
    fill = "steelblue", .width = .95) %>%
  gf_labs(title = "Sum of the multicollinear coefficients",
       subtitle = "Marked by the mean and 95% HDPI") %>%
  gf_refine(
    scale_y_continuous(NULL, breaks = NULL) 
  )

Model 6.2: One leg

Now we fit the revised model after ditching one of the leg lengths.

u6.2 <- 
  ulam(
    data = Legs, 
    alist(
      height ~ dnorm(mu, sigma),
      mu <- b0 + bl * leg_left,
      b0 ~ dnorm(10,100),        # don't mimic these priors!
      bl ~ dnorm(2, 10),
      sigma ~ dexp(1)
    ),
    chains = 4,         # run from 4 different starting points
    iter = 2000,        # use 2000 iterations per chain
    warmup = 1000,      # use 1000 iterations to tune parameters (per chain)
    cores = 4,          # spread computation over 4 cores
    seed = 6,           # set seed used for randomization in HMC
    file = "fits/u6.2"  # load model from a file if it exists; create file if it does not
  )
precis(u6.2)
##            mean         sd      5.5%     94.5%    n_eff     Rhat4
## b0    1.0035205 0.29284056 0.5387524 1.4655665 1348.848 1.0005882
## bl    1.9903927 0.06309714 1.8917949 2.0920675 1366.645 1.0004617
## sigma 0.6316664 0.04636479 0.5633310 0.7119578 1480.276 0.9999221

That posterior \(SD\) for leg_left looks much better. Compare this density to the one in Figure 6.2.b.

Comparing models 6.1 and 6.2

The basic lesson is only this: When two predictor variables are very strongly correlated (conditional on other variables in the model), including both in a model may lead to confusion. The posterior distribution isn’t wrong, in such cases. It’s telling you that the question you asked cannot be answered with these data. And that’s a great thing for a model to say, that it cannot answer your question. And if you are just interested in prediction, you’ll find that this leg model makes fine predictions. It just doesn’t make any claims about which leg is more important. (p. 166)

stanfit(u6.1) %>% as.data.frame() %>% head()
##          b0         bl         br     sigma      lp__
## 1 0.6250854  0.4684651  1.5888846 0.6208246 -4.430907
## 2 1.3275858 -1.7912004  3.7265424 0.6036564 -4.880740
## 3 1.1490153  2.9544048 -0.9876128 0.6695279 -4.433873
## 4 1.1490981  2.9556700 -0.9849575 0.6695494 -4.593386
## 5 1.4469277  4.4776990 -2.5982230 0.6474863 -6.576340
## 6 1.3255334  5.9460978 -4.0451483 0.6594375 -7.553118
Ind6.1 <- sim(u6.1) %>%
  apply(2, mean_hdi) %>%
  bind_rows() %>%
  bind_cols(Legs)
Ind6.2 <- sim(u6.2)  %>%
  apply(2, mean_hdi) %>%
  bind_rows() %>%
  bind_cols(Legs)

Predictions <-
  tibble(
    u6.1 = Ind6.1$y,
    u6.2 = Ind6.2$y
  )
gf_point(u6.2 ~ u6.1, data = Predictions)

gf_histogram( ~ (u6.2 - u6.1), data = Predictions)

Milk

Multicollinearity arises in real data, too.

data(milk, package = "rethinking")

Now use the handy rethinking::standardize() function to standardize our focal variables.

Milk <-
  milk %>% 
  mutate(k = rethinking::standardize(kcal.per.g),
         f = rethinking::standardize(perc.fat),
         l = rethinking::standardize(perc.lactose)) %>%
  select( - matches("\\."))   # ulam() doesn't like periods in variable names
names(Milk)
## [1] "clade"   "species" "mass"    "k"       "f"       "l"
# k regressed on f
u6.3 <- 
  ulam(
    data = Milk,
    alist(
      k ~ dnorm(mu, sigma),
      mu <- b0 + bf * f,
      b0 ~ dnorm(0, 0.2),
      bf ~ dnorm(0, 0.5),
      sigma ~ dexp(1)
    ),
    iter = 2000, warmup = 1000, chains = 4, cores = 4,
    seed = 6,
    file = "fits/u6.3")

# k regressed on l
u6.4 <- 
  ulam(
    data = Milk,
    alist(
      k ~ dnorm(mu, sigma),
      mu <- b0 + bl * l,
      b0 ~ dnorm(0, 0.2),
      bl ~ dnorm(0, 0.5),
      sigma ~ dexp(1)
    ),
    iter = 2000, warmup = 1000, chains = 4, cores = 4,
    seed = 6,
    file = "fits/u6.4")
precis(u6.3) 
##               mean         sd       5.5%     94.5%    n_eff     Rhat4
## b0    0.0001183386 0.08204103 -0.1315856 0.1332716 3480.209 1.0001097
## bf    0.8565499606 0.09063418  0.7095483 0.9997394 3380.035 0.9995976
## sigma 0.4867323051 0.06635529  0.3906409 0.6024127 3211.441 0.9999634
precis(u6.4) 
##               mean         sd       5.5%      94.5%    n_eff     Rhat4
## b0     0.000361488 0.07324321 -0.1186982  0.1180306 3315.626 0.9990835
## bl    -0.898332361 0.07935039 -1.0262626 -0.7710048 3831.419 1.0000656
## sigma  0.412635207 0.05837189  0.3324855  0.5129537 3030.955 0.9997075

Now “watch what happens when we place both predictor variables in the same regression model” (p. 167).

u6.5 <- 
  ulam(
    data = Milk,
    alist(
      k ~ dnorm(mu, sigma),
      mu <- b0 + bl * l + bf * f,
      b0 ~ dnorm(0, 0.2),
      bl ~ dnorm(0, 0.5),
      bf ~ dnorm(0, 0.5),
      sigma ~ dexp(1)
    ),
    iter = 2000, warmup = 1000, chains = 4, cores = 4,
    seed = 6,
    file = "fits/u6.5")

Now the \(\beta\)’s are smaller and less certain.

precis(u6.5)
##               mean         sd        5.5%      94.5%    n_eff    Rhat4
## b0     0.002459974 0.07423371 -0.11770770  0.1211462 2612.598 1.000329
## bl    -0.658345000 0.19893639 -0.96856317 -0.3352239 1353.622 1.002024
## bf     0.260630291 0.19900264 -0.05620976  0.5876586 1482.648 1.002344
## sigma  0.416147242 0.06087605  0.33070309  0.5186209 2284.932 0.999747
u6.5 %>% 
  stanfit() %>%
  mcmc_intervals(pars = vars(-lp__)) %>%
  gf_vline(xintercept = ~0, color = "red", linetype = "dotted")

milk %>% select(-species) %>%
  GGally::ggpairs()
## Registered S3 method overwritten by 'GGally':
##   method from   
##   +.gg   ggplot2
## Warning: Removed 12 rows containing non-finite values (stat_boxplot).
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 12 rows containing missing values
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 12 rows containing missing values
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 12 rows containing missing values
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 12 rows containing missing values
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 12 rows containing missing values
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 12 rows containing non-finite values (stat_bin).
## Warning: Removed 12 rows containing missing values (geom_point).

## Warning: Removed 12 rows containing missing values (geom_point).

## Warning: Removed 12 rows containing missing values (geom_point).

## Warning: Removed 12 rows containing missing values (geom_point).

## Warning: Removed 12 rows containing missing values (geom_point).
## Warning: Removed 12 rows containing non-finite values (stat_density).

Our two predictor “variables are negatively correlated, and so strongly so that they are nearly redundant. Either helps in predicting kcal.per.g, but neither helps much once you already know the other” (p. 169, emphasis in the original).

Anyway, making a DAG might help us make sense of this.

library(ggdag)

dag_coords <-
  tibble(name = c("L", "D", "F", "K"),
         x    = c(1, 2, 3, 2),
         y    = 3 - c(2, 2, 2, 1))

milk_dag <- 
  dagify(L ~ D,
       F ~ D,
       K ~ L + F,
       coords = dag_coords) 

milk_dag %>% drawdag()

The central tradeoff decides how dense, \(D\), the milk needs to be. We haven’t observed this variable, so it’s shown circled. Then fat, \(F\), and lactose, \(L\), are determined. Finally, the composition of \(F\) and \(L\) determines the kilocalories, \(K\). If we could measure \(D\), or had an evolutionary and economic model to predict it based upon other aspects of a species, that would be better than stumbling through regressions. (p. 167)

Rethinking: Identification guaranteed; comprehension up to you.

In case you have heard the word “identifiable” in another course:

technically speaking, identifiability is not a concern for Bayesian models. The reason is that as long as the posterior distribution is proper–which just means that it integrates to 1–then all of the parameters are identified. But this technical fact doesn’t also mean that you can make sense of the posterior distribution. So it’s probably better to speak of weakly identified parameters in a Bayesian context. (pp. 169–170, emphasis in the original)

Inspecting Stan models with bayesplot

The bayesplot packages provides lots of functions for inspecting our models. Most of these we could build ourselves pretty easily, but sometimes it is nice to let others do the work. We have already seen a few of thse plots above. Here are some more examples. Feel free to experimet or to look at the documentation to learn about what each plot does.

apropos('mcmc_')
##  [1] "example_mcmc_draws"     "from_ggmcmc_names"      "mcmc_acf"              
##  [4] "mcmc_acf_bar"           "mcmc_areas"             "mcmc_areas_data"       
##  [7] "mcmc_areas_ridges"      "mcmc_areas_ridges_data" "mcmc_combo"            
## [10] "mcmc_dens"              "mcmc_dens_chains"       "mcmc_dens_chains_data" 
## [13] "mcmc_dens_overlay"      "mcmc_hex"               "mcmc_hist"             
## [16] "mcmc_hist_by_chain"     "mcmc_intervals"         "mcmc_intervals_data"   
## [19] "mcmc_neff"              "mcmc_neff_data"         "mcmc_neff_hist"        
## [22] "mcmc_nuts_acceptance"   "mcmc_nuts_divergence"   "mcmc_nuts_energy"      
## [25] "mcmc_nuts_stepsize"     "mcmc_nuts_treedepth"    "mcmc_pairs"            
## [28] "mcmc_parcoord"          "mcmc_parcoord_data"     "mcmc_plot"             
## [31] "mcmc_rank_hist"         "mcmc_rank_overlay"      "mcmc_recover_hist"     
## [34] "mcmc_recover_intervals" "mcmc_recover_scatter"   "mcmc_rhat"             
## [37] "mcmc_rhat_data"         "mcmc_rhat_hist"         "mcmc_scatter"          
## [40] "mcmc_trace"             "mcmc_trace_data"        "mcmc_trace_highlight"  
## [43] "mcmc_violin"            "to_ggmcmc_names"

Example plots

u6.5 %>%
  stanfit() %>%
  mcmc_pairs(pars = vars(-lp__))

Note: the posterior for \(\sigma\) looks a little bit skewed. That’s an indication that we will be outgrowing quap() – this isn’t even a very complicated model.

u6.5 %>%
  stanfit() %>%
  mcmc_areas(pars = vars(-lp__))

u6.5 %>%
  stanfit() %>%
  mcmc_intervals(pars = vars(-lp__)) %>%
  gf_vline(xintercept = 0, color = "red", linetype = "dotted")
## Warning: geom_vline(): Ignoring `mapping` because `xintercept` was provided.

u6.5 %>%
  stanfit() %>%
  mcmc_trace()

u6.5 %>%
  stanfit() %>%
  mcmc_trace_highlight()

Just the data

Note: The functions that end in _data() return data instaed of a plot. So you can use those to get a nice tidy data frame of the sort need for these plot and then create your own custom plot from that data.

u6.5 %>%
  stanfit() %>%
  mcmc_intervals_data(pars = vars(-lp__)) %>%
  head()
## # A tibble: 4 x 9
##   parameter outer_width inner_width point_est      ll       l        m       h
##   <fct>           <dbl>       <dbl> <chr>       <dbl>   <dbl>    <dbl>   <dbl>
## 1 b0                0.9         0.5 median    -0.122  -0.0463  0.00320  0.0520
## 2 bl                0.9         0.5 median    -0.983  -0.793  -0.661   -0.530 
## 3 bf                0.9         0.5 median    -0.0645  0.127   0.258    0.391 
## 4 sigma             0.9         0.5 median     0.328   0.372   0.412    0.451 
## # … with 1 more variable: hh <dbl>
u6.5 %>%
  stanfit() %>%
  mcmc_intervals_data(pars = vars(-lp__)) %>%
  gf_pointrange(parameter ~ m + l + h, size = 3, color = "skyblue") %>%
  gf_pointrange(parameter ~ m + ll + hh, size = 1, color = "red")