In-class test: Monday, April 5
Take-home test: Due Wednesday, April 7
Coverage: The test is cumulative but will emphasize the more recent material from Statistical Rethinking chapters 5, 6, 8, and 9.
In addition to the topics for Test 1 that are still needed, here is a list of topics for Test 2.
Most of the models we have seen recently are in the category of multiple regression models.
These are part of a larger framework known as Generalized Linear Models – GLMs.
When we consider binary response variables, those are examples of GLMs (because the distribution of the response is something other than normal).
Explanatory (predictor) vs. response (predicted, outcome)
Numerical vs categorical
We’ve dealt mainly with metric numerical variables, but will eventually see some other types of numeric variables, like count data
Binary/dichotomous is a special case of categorical (and sometimes a bit easier to deal with), but you should also be able to handle categorical variables with more than two levels.
Converting categorical variables to numbers
Most of our models have been based on \(y \sim {\sf Norm}(\mu, \sigma)\) with some formula relating \(\mu\) to our explanatory variables.
This is not a requirement of Bayesian modeling, and we will start to see some other things soon.
We have also seen a few models that have a binary response (so they use a binomial distribution for the response instead of a normal distribution).
In the “normal response” context, \(\mu\) = “average” and \(\sigma\) quantifies the noise.
A few of our models have been based on \(y \sim {\sf Binom}(n, p)\).
Take advantage of algebra to help figure out (a) what model you want or (b) what a given model is doing.
quap()
and ulam()
);Main ones we have used: Normal, Exponential, Log-normal, Uniform, Triangle, Gamma, Beta, Binomial
What the distribution parameters mean and how they effect things like priors.
If you use the word ‘skewed’, be sure you use it correctly.
How to choose shape (family) and how to choose parameters (and explain your choice).
Prior predictive checks to understand priors and make sure they are reasonable
R functions for working with distributions: gf_dist()
; beta_params()
; gamma_params()
; dnorm()
, pnorm()
, qnorm()
, rnorm()
, and similar for other distributions.
How MCMC algorithms sample from the posterior
Metropolis Algorithm
Hamiltonian Markov Chain
Fitting models with Stan via ulam()
Basic diagnostics
Preparing data for Stan
Fundamental Confounds
Causal and non-causal (backdoor) paths
How to select variables for inclusion to estimate total causal effect of \(X\) on \(Y\).
dagitty
and ggdag
packages, plus CalvinBayes::gg_dag()
What a posterior sample is and how to use it
What the precis()
summary tells us
Useful R functions: extract.sample()
, link()
, sim()
, apply()
, mean_hdi()
tibble()
, mutate()
, filter()
, bind_rows()
, bind_cols()
, etc. to get things into the desired format.Plots
mcmc_
plotslink()
or sim()
H(P)DI
How to interpret the (posterior distributions of) the parameters in the models we have seen
What link()
and sim()
do (and how they differ)