Stat 341
Computational Bayesian Statistics
Spring 2019

[calendar] [test info] [resources] [Redoing BDA] [from class] [homework]

Test Info

Information about tests will be posted here as it becomes available.


Test 1

Date: Monday, March 4

Coverage: DBDA, chapters 1 - 8.

Topics

Probability (especially conditional probability)

definition of conditional probability; independence; relating P(A|B) to P(B|A) (Bayes Theorem); connection to Bayesian data analysis.

Distributions

pmf and pdf, area = probability; commonly used distributions (Beta, Uniform, Triangle, Normal, Bernoulli, Binomial) and their parameterizations.

R functions for working with distributions: gf_dist(); beta_params(); dnorm(), pnorm(), qnorm(), rnorm(), and similar for other distributions.

The Bayesian Data Analysis Framework

prior, likelihood, posterior, normalization (the puppies); interpreting priors and posteriors; HDI; posterior probabilities; describing Bayesian models in mathematical notation.

Grid Method

How it works; obtaining distribution of a single parameter from a grid that involves multiple parameters; generating posterior samples from a grid.

Useful R functions: seq(), expand.grid(), %>%, mutate(), hdi_from_grid(), group_by(), mosaic::resample().

Exact Bayesian Analysis

Case study: Beta prior and Bernoulli likelihood.

Posterior Sampling

Metropolis Algorithm; Gibbs Sampling; JAGS via the R2jags package; sampling from a prior (how and why).

Useful R functions: jags(), metro_bern(), as.mcmc(), posterior(), various plotting functions.


Test 2

Date: Friday, April 5

Coverage: The test is cumulative but will emphasize the more recent material from DBDA, chapters 7-9, 14-17. (Not all topics in Chapter 15 were covered – see notes.)

Topics

In addition to the topics for test 1 that are still needed, here is a list of topics for test 2.

Generalized Linear Model (GLM) framework

Taxonomy of Variables
  • Explanatory (predictor) and response (predicted)
  • metric, count, dichotomous, nominal, ordinal
    • converting categorical variables to numbers for JAGS and Stan
response = “average” + “noise”
Creating models
  • specifying a model with formulas and/or diagrams;
  • converting back and forth between model specifications and JAGS [or Stan] code.
  • interpreting parameters in the model (including derived parameters like the difference in means)
  • selecting priors for parameters in the model
Specific situations:
  • dichotomous ~ dichotomous
  • metric ~ dichotomous
  • metric ~ metric
  • metric ~ metric + dichotomous
  • metric ~ metric + metric
  • paired comparisons vs group-wise comparisons (eg, extra sleep ~ drug case study)
  • centering and standardization (how and why)
Using transformations (how and why)

Distributions for Priors and Likelihoods

Commonly used distributions
  • Beta, Uniform, Triangle, Normal, T, Bernoulli, Binomial, Gamma, Exponential
  • parameterizations (in R/JAGS/Stan).
Selecting priors
  • how to choose shape and how to choose parameters (and explain your choice).
  • “uninformative” / “weakly informative” priors

R functions for working with distributions: gf_dist(); beta_params(); gamma_params(); dnorm(), pnorm(), qnorm(), rnorm(), and similar for other distributions.

If you use the word skewed, be sure you use it correctly.

Posterior Sampling

Useful R functions: jags(), sampling(), as.mcmc(), as.mcmc.list(), posterior(), hdi(), plot_post(), various plotting functions.

JAGS
  • high level understanding of Gibbs sampler (method used by JAGS)
  • fitting models via the R2jags package
  • diagnosing whether JAGS has successfully sampled from (a good approximation to) the posterior distribution
  • sampling from a prior in JAGS (how and why).
Stan
  • High level understanding of HMC algorithm (that’s the algorithm Stan uses) and why it works better than Gibbs sampling in some situations.
  • You won’t be required to write Stan code from scratch, but you may choose to use Stan instead of JAGS if not directed otherwise.
  • You may be given Stan code to use/interpret.
Preparing data for JAGS/Stan
  • converting categorical data to numbers (1 and 2, for dichotomous)
Interpreting the posterior distribution
  • plots (mcmc_ plots, plot_post(), custom plots after using posterior())
  • H(P)DI
  • what ROPE stands for
  • how/why to use ROPE
  • how to interpret the (posterior distributions of) the parameters in the models we have seen

Posterior Predictive Checks

  • What a posterior predictive check is and what it is useful for.
  • Performing posterior predictive checks on a model.
  • Useful R functions: posterior_calc(), the ppc_ functions in bayesplot, CalvinBayes::rstudent_t()

Test 3

Date: Monday, May 6

Coverage: The test is cumulative but will emphasize the more recent material from DBDA, chapters 15 - 21 and 24. In particular, it will be primarily about Bayesian generalized linear models, how to fit them using brm(), and how to interpret the results.

Main Topics

Generalized Linear Model (GLM) framework

Taxonomy of Variables
  • Explanatory (predictor) and response (predicted)
  • metric, count, dichotomous, nominal, ordinal
Building blocks
  • response = “average” + “noise”; “average” is a linear function of the predictors \[ \begin{align*} \mathrm{link}(\mu) &= \mathrm{lin}(x); & y &\sim {\sf Dist}(\mu, \mbox{other parameters}) \\ \mu &= \mathrm{inv link}(\mathrm{lin}(x)); & y &\sim {\sf Dist}(\mu, \mbox{other parameters}) \end{align*} \]
  • recoding predictors
    • dichotomous variables as indicator (0/1) variables
    • nominal variables as multiple indicator variables
  • interaction terms
  • distributions for response (family, “noise”)
  • link functions
    • logit link for dichotomous response (others possible: probit, robit, etc.)
    • log link for Poisson (count response)
    • log link for heterogeneous variances
    • identity link
  • transforming variables (response or predictors – how and why)
  • hierarchical models (eg, fields in the fertilizer/tilling study)
  • paired comparisons vs group-wise comparisons (eg, extra sleep ~ drug case study)

Distributions for Priors and Likelihoods

Commonly used distributions
  • Beta, Uniform, Triangle, Normal, T, Bernoulli, Binomial, Gamma, Exponential, Poisson
  • parameterizations (in R/JAGS/Stan).
Selecting priors
  • how to choose shape and how to choose parameters (and explain your choice)
  • “uninformative” / “weakly informative” priors
  • improper uniform priors (commonly used default for brm())
  • R functions for working with distributions: gf_dist(); beta_params(); gamma_params(); dnorm(), pnorm(), qnorm(), rnorm(), and similar for other distributions.

If you use the word skewed, be sure you use it correctly.

Using Stan via the brms package

  • High level understanding of HMC algorithm (that’s the algorithm Stan uses) and why it works better than Gibbs sampling in some situations.
  • brm()
    • formulas to describe models
    • setting priors (set_prior()), inspecting priors (prior_summary())
    • family and link functions (eg, family = bernoulli(link = logit)))
    • data (and how brm() creates new variables for you)
    • extracting information from a brmsfit object: stancode(), stanfit(), posterior(), etc.
    • hierarchical models (eg, (1 | Field) in fertilizer/tilling study)
    • setting number of iterations, chains, etc.
  • Using update() can make fitting subsequent models faster by avoiding the compile step.
  • Diagnostics (how to tell whether Stan seems to be working well)

Posterior Distributions

  • H(P)DI
  • what ROPE stands for
  • how/why to use ROPE
  • how to interpret the (posterior distributions of) the parameters in the models we have seen
  • contrasts and other quantities derived from model parameters/coefficients
  • useful R functions: jags(), sampling(), as.mcmc(), as.mcmc.list(), posterior(), mutate(), hypothesis(), hdi(), plot_post(), marginal_effects(), and various other plotting functions.

Comparing models

  • in-sample vs out-of-sample predictive accuracy
  • elpd and its approximations (especially WAIC and LOO)
  • loo(), waic(), compare(), loo_compare()

Final Exam 3

In-Class Portion: Tuesday, May 14 @ 9am

  • This will be mostly paper-and-pencil, but bring your laptop in case you need it here or there.

Take-Home Portion: due by noon on Friday, May 17.

  • The exam is available here.

Coverage: The test is cumulative but here are some things you will not need to know:

  • How to write JAGS code. If there is any JAGS, I will give you JAGS code and ask you about it rather than have you write it from scratch.
  • Power calculations. These are very important, but they can be time consuming and we did not have a lot of time to practice these. You should understand what power is, the general approach to calculating power, and how to interpret a power calculation.