Test Info

Information about tests will be posted here as it becomes available.

Test 1

Date: Monday, March 4

Coverage: DBDA, chapters 1 - 8.

Topics

Probability (especially conditional probability)

definition of conditional probability; independence; relating P(A|B) to P(B|A) (Bayes Theorem); connection to Bayesian data analysis.

Distributions

pmf and pdf, area = probability; commonly used distributions (Beta, Uniform, Triangle, Normal, Bernoulli, Binomial) and their parameterizations.

R functions for working with distributions: gf_dist(); beta_params(); dnorm(), pnorm(), qnorm(), rnorm(), and similar for other distributions.

The Bayesian Data Analysis Framework

prior, likelihood, posterior, normalization (the puppies); interpreting priors and posteriors; HDI; posterior probabilities; describing Bayesian models in mathematical notation.

Grid Method

How it works; obtaining distribution of a single parameter from a grid that involves multiple parameters; generating posterior samples from a grid.

Useful R functions: seq(), expand.grid(), %>%, mutate(), hdi_from_grid(), group_by(), mosaic::resample().

Exact Bayesian Analysis

Case study: Beta prior and Bernoulli likelihood.

Posterior Sampling

Metropolis Algorithm; Gibbs Sampling; JAGS via the R2jags package; sampling from a prior (how and why).

Useful R functions: jags(), metro_bern(), as.mcmc(), posterior(), various plotting functions.

Test 2

Date: Friday, April 5

Coverage: The test is cumulative but will emphasize the more recent material from DBDA, chapters 7-9, 14-17. (Not all topics in Chapter 15 were covered – see notes.)

Topics

In addition to the topics for test 1 that are still needed, here is a list of topics for test 2.

Generalized Linear Model (GLM) framework

Taxonomy of Variables

Explanatory (predictor) and response (predicted)
metric, count, dichotomous, nominal, ordinal
- converting categorical variables to numbers for JAGS and Stan

response = “average” + “noise”

Creating models

specifying a model with formulas and/or diagrams;
converting back and forth between model specifications and JAGS [or Stan] code.
interpreting parameters in the model (including derived parameters like the difference in means)
selecting priors for parameters in the model

Specific situations:

dichotomous ~ dichotomous
metric ~ dichotomous
metric ~ metric
metric ~ metric + dichotomous
metric ~ metric + metric
paired comparisons vs group-wise comparisons (eg, extra sleep ~ drug case study)
centering and standardization (how and why)

Using transformations (how and why)

Distributions for Priors and Likelihoods

Commonly used distributions

Beta, Uniform, Triangle, Normal, T, Bernoulli, Binomial, Gamma, Exponential
parameterizations (in R/JAGS/Stan).

Selecting priors

how to choose shape and how to choose parameters (and explain your choice).
“uninformative” / “weakly informative” priors

R functions for working with distributions: gf_dist(); beta_params(); gamma_params(); dnorm(), pnorm(), qnorm(), rnorm(), and similar for other distributions.

If you use the word skewed, be sure you use it correctly.

Posterior Sampling

Useful R functions: jags(), sampling(), as.mcmc(), as.mcmc.list(), posterior(), hdi(), plot_post(), various plotting functions.

JAGS

high level understanding of Gibbs sampler (method used by JAGS)
fitting models via the R2jags package
diagnosing whether JAGS has successfully sampled from (a good approximation to) the posterior distribution
sampling from a prior in JAGS (how and why).

Stan

High level understanding of HMC algorithm (that’s the algorithm Stan uses) and why it works better than Gibbs sampling in some situations.
You won’t be required to write Stan code from scratch, but you may choose to use Stan instead of JAGS if not directed otherwise.
You may be given Stan code to use/interpret.

Preparing data for JAGS/Stan

converting categorical data to numbers (1 and 2, for dichotomous)

Interpreting the posterior distribution

plots (mcmc_ plots, plot_post(), custom plots after using posterior())
H(P)DI
what ROPE stands for
how/why to use ROPE
how to interpret the (posterior distributions of) the parameters in the models we have seen

Posterior Predictive Checks

What a posterior predictive check is and what it is useful for.
Performing posterior predictive checks on a model.
Useful R functions: posterior_calc(), the ppc_ functions in bayesplot, CalvinBayes::rstudent_t()

Test 3

Date: Monday, May 6

Coverage: The test is cumulative but will emphasize the more recent material from DBDA, chapters 15 - 21 and 24. In particular, it will be primarily about Bayesian generalized linear models, how to fit them using brm(), and how to interpret the results.

Main Topics

Generalized Linear Model (GLM) framework

Taxonomy of Variables

Explanatory (predictor) and response (predicted)
metric, count, dichotomous, nominal, ordinal

Building blocks

response = “average” + “noise”; “average” is a linear function of the predictors \[ \begin{align*} \mathrm{link}(\mu) &= \mathrm{lin}(x); & y &\sim {\sf Dist}(\mu, \mbox{other parameters}) \\ \mu &= \mathrm{inv link}(\mathrm{lin}(x)); & y &\sim {\sf Dist}(\mu, \mbox{other parameters}) \end{align*} \]
recoding predictors
- dichotomous variables as indicator (0/1) variables
- nominal variables as multiple indicator variables
interaction terms
distributions for response (family, “noise”)
link functions
- logit link for dichotomous response (others possible: probit, robit, etc.)
- log link for Poisson (count response)
- log link for heterogeneous variances
- identity link
transforming variables (response or predictors – how and why)
hierarchical models (eg, fields in the fertilizer/tilling study)
paired comparisons vs group-wise comparisons (eg, extra sleep ~ drug case study)

Distributions for Priors and Likelihoods

Commonly used distributions

Beta, Uniform, Triangle, Normal, T, Bernoulli, Binomial, Gamma, Exponential, Poisson
parameterizations (in R/JAGS/Stan).

Selecting priors

how to choose shape and how to choose parameters (and explain your choice)
“uninformative” / “weakly informative” priors
improper uniform priors (commonly used default for brm())
R functions for working with distributions: gf_dist(); beta_params(); gamma_params(); dnorm(), pnorm(), qnorm(), rnorm(), and similar for other distributions.

If you use the word skewed, be sure you use it correctly.

Using Stan via the brms package

High level understanding of HMC algorithm (that’s the algorithm Stan uses) and why it works better than Gibbs sampling in some situations.
brm()
- formulas to describe models
- setting priors (set_prior()), inspecting priors (prior_summary())
- family and link functions (eg, family = bernoulli(link = logit)))
- data (and how brm() creates new variables for you)
- extracting information from a brmsfit object: stancode(), stanfit(), posterior(), etc.
- hierarchical models (eg, (1 | Field) in fertilizer/tilling study)
- setting number of iterations, chains, etc.
Using update() can make fitting subsequent models faster by avoiding the compile step.
Diagnostics (how to tell whether Stan seems to be working well)

Posterior Distributions

H(P)DI
what ROPE stands for
how/why to use ROPE
how to interpret the (posterior distributions of) the parameters in the models we have seen
contrasts and other quantities derived from model parameters/coefficients
useful R functions: jags(), sampling(), as.mcmc(), as.mcmc.list(), posterior(), mutate(), hypothesis(), hdi(), plot_post(), marginal_effects(), and various other plotting functions.

Comparing models

in-sample vs out-of-sample predictive accuracy
elpd and its approximations (especially WAIC and LOO)
loo(), waic(), compare(), loo_compare()

Final Exam 3

In-Class Portion: Tuesday, May 14 @ 9am

This will be mostly paper-and-pencil, but bring your laptop in case you need it here or there.

Take-Home Portion: due by noon on Friday, May 17.

The exam is available here.

Coverage: The test is cumulative but here are some things you will not need to know:

How to write JAGS code. If there is any JAGS, I will give you JAGS code and ask you about it rather than have you write it from scratch.
Power calculations. These are very important, but they can be time consuming and we did not have a lot of time to practice these. You should understand what power is, the general approach to calculating power, and how to interpret a power calculation.

Stat 341 Computational Bayesian Statistics Spring 2019

Test Info

Test 1

Topics

Probability (especially conditional probability)

Distributions

The Bayesian Data Analysis Framework

Grid Method

Exact Bayesian Analysis

Posterior Sampling

Test 2

Topics

Generalized Linear Model (GLM) framework

Taxonomy of Variables

response = “average” + “noise”

Creating models

Specific situations:

Using transformations (how and why)

Distributions for Priors and Likelihoods

Commonly used distributions

Selecting priors

Posterior Sampling

JAGS

Stan

Preparing data for JAGS/Stan

Interpreting the posterior distribution

Posterior Predictive Checks

Test 3

Main Topics

Generalized Linear Model (GLM) framework

Taxonomy of Variables

Building blocks

Distributions for Priors and Likelihoods

Commonly used distributions

Selecting priors

Using Stan via the brms package

Posterior Distributions

Comparing models

Final Exam 3

Stat 341
Computational Bayesian Statistics
Spring 2019