Overall

Pay attention to your use of terms and pronouns (avoiding the latter unless it is clear what they refer to).

Keep paying attention to little words like OF (distribution of, mean of, etc.)

Question 1

Part 1

\(\nu\) is the Greek letter “nu”. It is the traditional letter used for the degrees of freedom in a t-distribution. (I don’t know why that letter is used – probably someone did it in a paper and it stuck.)

While a parameter may express something about the population being studied, the posterior distribution is not about the population – it is about the posterior belief of the model about the parameter.

The normality assessment here is inconclusive – there are large portions of the posterior distribution for \(\nu\) both above and below 30. That’s not surprising with such a small sample. There is certianly not enough evidence here to be convinced that things are not normal (even though the mean of the HDI is \(< 1.5\)). Don’t get overly fixated on the posterior mean, always consider the entire posterior distribution. (Note: the majority being on one side does not constitute compelling evidence.)

Parts 2 and 3

The \(log_{10}(\nu) < 1.5\) threshold is a bit artibrarty, but it corresponds roughly to \(\nu < 30\), and beyond 30 degrees of freedom, t distributions look very much like normal distributions. So that serves a rough benchmark. Your plots of the normal and t-distributions should have shown this.

A “weakly informative” is not completely uninformative. In particular, a weakly informed prior should

  • enforce basic constraints (proabilities between 0 and 1, standard deviations positive, etc.)

  • get the order of magnitude right, and

  • still allow a fair amount of flexibility.

The paper suggests that the author used a uniform prior for the standard deviations. This is generally not a great idea. It doesn’t match what we know (there isn’t a hard edge between plausible and impossible values for a standard deviation) and it can cause trouble for some numerical algorithms. A uniform distribuiton could make sense for some other parameters – like for a prportion, where we do know that proportions are constrained to the interval \([0, 1]\).

When choosing an exponential distribution as a prior for a standard deviation, give some thought to the order of magnitude. \(\Exp(1)\) is not always a good choice. Other distributions (e.g., Cauchy, half-normal, half t) are also used for priors for standard deviations parameters).

Plotting went well for most of you, and there were a number of different approaches used.

Question 2 – logit

This went well for most of you. The key properties of the logit function are

As a bonus, the parameters in logistic regression may be interpretable on a log-odds scale.

Question 3 – T2D

Note: If you took the test in class, there was an error on the printed version. Basically in model u1, the parameters a[1], a[2] and a[3] were reversed.1 This made u1 look like it was saying some things that didn’t really make sense, but I was mostly able to tell if you know what you should know despite that problem.

The answers below will be for the corected version.

u1
## Hamiltonian Monte Carlo approximation
## 8000 samples from 4 chains
## 
## Sampling durations (seconds):
##         warmup sample total
## chain:1   0.03   0.05  0.08
## chain:2   0.03   0.05  0.08
## chain:3   0.03   0.05  0.08
## chain:4   0.03   0.05  0.08
## 
## Formula:
## cases ~ dbinom(n, p)
## logit(p) <- a[geno_idx] + b * male
## a[geno_idx] ~ dnorm(0, 1.5)
## b ~ dnorm(0, 0.5)
precis(u1, depth = 2)
##             mean         sd        5.5%      94.5%    n_eff    Rhat4
## a[1]  0.41047712 0.24480168  0.02572729  0.8084560 6296.403 1.000396
## a[2]  0.04153843 0.08485679 -0.09349285  0.1780664 4898.635 1.000532
## a[3] -0.28510288 0.06747267 -0.39291791 -0.1770970 3890.285 0.999960
## b     0.30052028 0.08225182  0.16821999  0.4301554 3863.706 1.000199
u1 %>% stanfit() %>% mcmc_areas(regex_pars = "a|b")

u2
## Hamiltonian Monte Carlo approximation
## 8000 samples from 4 chains
## 
## Sampling durations (seconds):
##         warmup sample total
## chain:1   0.02   0.05  0.07
## chain:2   0.02   0.04  0.06
## chain:3   0.02   0.04  0.07
## chain:4   0.02   0.04  0.06
## 
## Formula:
## cases ~ dbinom(n, p)
## logit(p) <- a * Gdose + b * male
## a ~ dnorm(0, 0.5)
## b ~ dnorm(0, 1.5)
u2 %>% stanfit() %>% mcmc_areas(regex_pars = "a|b")

Part a

Negative a means that people with more G’s in their genotype are less likely to have T2D than people of the same sex who have fewer G’s in their genotype.

Posivie b means that men with a given genotype are more likely than women with that same genotype to have T2D.

Part b

For part ii I was not looking for an improvement on the model, I was looking for an improved answer to the question. Using the full posterior rather than just the posterior means of the parameters is better. We could use link() for this. (link() is better than sim() for the question asked.) This will give us a posterior distribution which we could summarise with something like the mean, but also allows us to create an interval estimate, which let’s us know how precise the model things its value is.

Part c – primary difference between the two models

Model u1 has a parameter for each geneotype, so it can independently choose values for each.

Model u2 has one parameter for genotype. In particular this means that the difference in log odds (ie log of the odds ratio) is the same when comparing GG to GT and when comparing GT to TT. In particular this model must have the proportion of GT subjects with T2D between the proportions for GG and TT. In the first model, they coud be in any order. This also means that subjects from each genotype contribute to the estimates for the the other genotypes.

part d – which model?

Remember that WAIC and PSIS also have posterior distributions and the key thing to look at when comparing two models is the posterior distribution of the difference in WAIC or PSIS.

part e – WAIC and PSIS

I was pretty generous here, and I wasn’t looking for a detailed description of who WAIC or PSIS is calculated.

Here are the main things I was looking for:

  1. Both WAIC and PSIS are trying to estimate how well the model will make out-of-sample (ie, new data, not used to train the model) predictions.

  2. PSIS approximates LOO (leave-one-out) cross-validation.

    The clever part is to avoid refitting all those Bayesian models, which is does by cleverly sampling from the posterior for the original model using Pareto-smoothed importance sampling (PSIS).

  3. WAIC rewards a model for fitting the training data better, but penalizes a model for being flexible.

    This attempts to balance overfitting and underfitting. So a model that is much more flexible, but only fits that data a little bit better will fare worse than much less flexibly model that fits the data only sligthly worse.

part f – why wide posterior?

This was worth 0 points because of my goof (see above). Nevertheless, the majority of you were on the right track (and some of you confused because of it): In this particular case, there is much less data in one genotype group, so the model can’t estimate the parameter associated with that group as precisely as it can for the others.

part g – differences

The main issue here is that to tell if things are different we want to look at the posterior distribution of the difference, and you can’t tell what that is just by looking at the two distributions separately. Most of you are getting this most of the time. That’s a good thing. But keep reminding yourself about this.

More generally, keep in mind that essentially everything interesting in a Bayesian model has a posterior distribution. Even when tools like precis() report numbers, they are just reporting a summary of this posterior distribution. The distribution is the thing. So start from that, and then figure out what sort of summary number or plot or whatever you want to compute from the posterior distribution.

part h – multlevel?

Neither model is multilevel. Model u2 does share some information among the genotype groups (the parameter a is affected by people in all three groups), so in that sense there is some pooling of information. But a multilevel model has hyperparameters – parameters that themselves have parameters. We have seen two uses of this: (a) partial pooling for intercepts (slopes would be essentially the same) and (b) modeling errors in measurements.


  1. How could this happen, you ask? As I was tweaking the exam, I decided to flip my coding of geno_idx to make the wording of 3a easier. Since I was saving the models with file = ..., I deleted them and recompiled the test – or at least that’s that I thought. I must have misclicked on one of the models so that it wasn’t deleted as I thought. Once I deleted it and recompiled, everything was back to what I was expecting.↩︎