Stat 341
Computational Bayesian Statistics
Spring 2021


[RStudio] [calendar] [test info] [resources] [from class] [homework]

Homework due after Test 2

Due Task Source Problems, etc.
W 4/14 PS 14 Rethinking 7E3 (entropy of die) • 7M4 (effective number of parameters) • 7M5-6 (informative priors) • 7H1-2 (Laffer data) • 7H5 (foxes)
T 4/20 PS 15 Rethinking 11E1-2 (log-odds) • [11E3][11M3] (logit link) • 11H2 (eagles) • 11H4 (NWO)
S 4/24 PS 16 Rethinking 13E2-3 (make multi-level) • 13M1 (4 frog models) • 13M2 (4 frog models) • 13M6 (Normal and t)
W 4/28 PS 17 Rethinking 13H1 (contraceptive use) • 15H4 (Primates301)

Some Code and Comments

For PS 17 – 15H4

Don’t mimic the author’s code for modifying the data set – we have nicer ways:

library(rethinking)
library(dplyr)
library(tidyr)
library(patchwork)
data("Primates301")
Primates <-
  Primates301 %>%
  drop_na(body, brain) %>%
  mutate(
    M = body / max(body),
    B = brain / max(brain, na.rm = TRUE),
    Bse = 0.1 * B,
    Mse = 0.1 * M,
    species_idx = as.numeric(factor(species))
  )
glimpse(Primates)
## Rows: 182
## Columns: 21
## $ name                <fct> Allenopithecus_nigroviridis, Alouatta_belzebul, Al…
## $ genus               <fct> Allenopithecus, Alouatta, Alouatta, Alouatta, Alou…
## $ species             <fct> nigroviridis, belzebul, caraya, guariba, palliata,…
## $ subspecies          <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ spp_id              <int> 1, 3, 4, 5, 6, 7, 8, 9, 10, 14, 18, 21, 22, 23, 24…
## $ genus_id            <int> 1, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 6, 6, 7, 7, 7, 7,…
## $ social_learning     <int> 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, NA, 0, 0, 0, 2, 0…
## $ research_effort     <int> 6, 15, 45, 37, 79, 25, 4, 82, 22, 16, 58, NA, 1, 1…
## $ brain               <dbl> 58.02, 52.84, 52.63, 51.70, 49.88, 51.13, 59.08, 5…
## $ body                <dbl> 4655.00, 6395.00, 5383.00, 5175.00, 6250.00, 8915.…
## $ group_size          <dbl> 40.00, 7.40, 8.90, 7.40, 13.10, 5.50, NA, 7.90, 4.…
## $ gestation           <dbl> NA, NA, 185.92, NA, 185.42, 185.92, NA, 189.90, NA…
## $ weaning             <dbl> 106.15, NA, 323.16, NA, 495.60, NA, NA, 370.04, 22…
## $ longevity           <dbl> 276.0, NA, 243.6, NA, 300.0, 240.0, NA, 300.0, NA,…
## $ sex_maturity        <dbl> NA, NA, 1276.72, NA, 1578.42, NA, NA, 1690.22, NA,…
## $ maternal_investment <dbl> NA, NA, 509.08, NA, 681.02, NA, NA, 559.94, NA, 20…
## $ M                   <dbl> 0.035807692, 0.049192308, 0.041407692, 0.039807692…
## $ B                   <dbl> 0.11810206, 0.10755796, 0.10713050, 0.10523745, 0.…
## $ Bse                 <dbl> 0.011810206, 0.010755796, 0.010713050, 0.010523745…
## $ Mse                 <dbl> 0.0035807692, 0.0049192308, 0.0041407692, 0.003980…
## $ species_idx         <dbl> 110, 20, 28, 65, 116, 123, 140, 144, 17, 79, 155, …

Log-Normal distributions often get people confused. Remember

\[ B \sim {\sf LogNormal}(\mu, \sigma) \]

Just means that

\[ \log(B) \sim {\sf Norm}(\mu, \sigma) \] In particular, \(\mu\) is not the mean of the distribution for \(B\), it is the mean of the distribution of \(\log B\). So the model presented would be the same as using \(\log B\) for the response and a normal distribution for the likelihood.

You can see from these plots that \(\log B\) vs. \(\log M\) is approximately linear while \(B\) vs \(\log M\) is not.

Primates %>% gf_point(B ~ M) | 
Primates %>% gf_point(B ~ log(M)) | 
Primates %>% gf_point(log(B) ~ log(M))

Be sure to keep that in mind as you work on this problem.

And if you want to save a little time typing, here is m15H4 (incorrectly referred to as m1.1).

m15H4 <-
  ulam(
    data = Primates %>% select(brain, body, B, M, Bse, Mse, species),
    alist(
      B ~ dlnorm(mu, sigma),
      mu <- a + b * log(M),
      a ~ dnorm(0, 1),
      b ~ dnorm(0, 1),
      sigma ~ dexp(1)
      ),
    iter = 4000, chains = 4, cores = 4,
    file = "fits/m15H4"
  )

Note: There are few ulam/Stan things to watch out for in the updated model.

  • You will need to declare your B_true and M_true as vectors of length 182. You could also add that to the data, but then you would need to use a list instead of a data frame since each column in a data frame must have the same number of rows. [McElreath uses a list so he can add an item N with this value]

    For our purposes in this model, you can just hard code that number, or you can present the data in a list with n = 182. That’s up to you.

  • Notice in the book code for m15.2 that there is an extra [i] on the line defining mu. Apparently, the ulam to Stan conversion isn’t able to figure that out for us. You will probably need something similar in your model.

  • In m15.2, instead of using an index variable for species, McElreath is just using the row index i since each row is a different species. Presumably this could also be done with a species index variable – we’d just need to experiment to figure out how the information gets passed along to Stan. We might need to use something like species_idx[i] in some places where we normally use just species_idx.

In case you are curious, here is how m15H4 above looks to Stan:

m15H4 %>% stancode()
data{
    vector[182] Mse;
    vector[182] Bse;
    vector[182] B;
    vector[182] M;
}
parameters{
    real a;
    real b;
    real<lower=0> sigma;
}
model{
    vector[182] mu;
    sigma ~ exponential( 1 );
    b ~ normal( 0 , 1 );
    a ~ normal( 0 , 1 );
    for ( i in 1:182 ) {
        mu[i] = a + b * log(M[i]);
    }
    B ~ lognormal( mu , sigma );
}

Notice those for (i in 1:182) loops and the insertion of the index variable i. For simple models like this, ulam() seems to figure out the number 182 and add in the indexing for us. But for the more complicated model, we need to give it some assistance.