Test 2 Info

Logistics

In-class test: Monday, April 5
Take-home test: Due Wednesday, April 7
Coverage: The test is cumulative but will emphasize the more recent material from Statistical Rethinking chapters 5, 6, 8, and 9.

In addition to the topics for Test 1 that are still needed, here is a list of topics for Test 2.

Most of the models we have seen recently are in the category of multiple regression models.

These are part of a larger framework known as Generalized Linear Models – GLMs.
When we consider binary response variables, those are examples of GLMs (because the distribution of the response is something other than normal).

Explanatory (predictor) vs. response (predicted, outcome)
Numerical vs categorical
- We’ve dealt mainly with metric numerical variables, but will eventually see some other types of numeric variables, like count data
- Binary/dichotomous is a special case of categorical (and sometimes a bit easier to deal with), but you should also be able to handle categorical variables with more than two levels.
Converting categorical variables to numbers
- index variables (starting with 1)
- indicator variables (0/1)
- dummy variables (multiple indicator variables used together)
- reasons to prefer index variables to indicator/dummy variables (most of the time)

Most of our models have been based on \(y \sim {\sf Norm}(\mu, \sigma)\) with some formula relating \(\mu\) to our explanatory variables.
- This is not a requirement of Bayesian modeling, and we will start to see some other things soon.
- We have also seen a few models that have a binary response (so they use a binomial distribution for the response instead of a normal distribution).
In the “normal response” context, \(\mu\) = “average” and \(\sigma\) quantifies the noise.
- Be able to interpret what \(\sigma\) tells you in the context of the model.
- Be able to interpret what \(\mu\) tells you in the context of the model.
- Be able to interpret what the parameters involved in the formula for \(\mu\) tell you in the context of the model.
- In most of our models \(\sigma\) has been the same for all observations, but this is not required. (See brief section on "multiple \(\sigma\)’s for an example.)
A few of our models have been based on \(y \sim {\sf Binom}(n, p)\).
- average becomes proportion in these models
Take advantage of algebra to help figure out (a) what model you want or (b) what a given model is doing.

Specifying a model with formulas and/or computer code (quap() and ulam());
Interpreting parameters in the model (including derived parameters like a difference in means)
Selecting priors for parameters in the model
Prior predictive checks
Centering/standardization (how and why)
Other transforamtions (e.g., log)
Deciding which variables to include as predictors
Interaction (what, when, how, and interpretation)

Main ones we have used: Normal, Exponential, Log-normal, Uniform, Triangle, Gamma, Beta, Binomial
What the distribution parameters mean and how they effect things like priors.
- Note: We have two contexts for the word parameter – parameters of our model and parameters of distributions. Sometimes we use a model parameter (or combination of model parameters) to specify a distribution parameter. Sometimes (in our priors, for example) we set distribution parameters to numbers.
If you use the word ‘skewed’, be sure you use it correctly.

How to choose shape (family) and how to choose parameters (and explain your choice).
Prior predictive checks to understand priors and make sure they are reasonable
R functions for working with distributions: gf_dist(); beta_params(); gamma_params(); dnorm(), pnorm(), qnorm(), rnorm(), and similar for other distributions.

How MCMC algorithms sample from the posterior
Metropolis Algorithm
- King Markov story and how it relates to use of Metropolis algorithm in Bayesian modeling
- Jump rules, proposal acceptance, etc.
Hamiltonian Markov Chain
- Big ideas behind that algorithm
- Why it is often better than the basic Metropolis algorithm
Fitting models with Stan via ulam()
Basic diagnostics
- effective sample size
- Rhat
- trace plots
Preparing data for Stan
- remove missing values (including missing values in variables not used in the model!)

Fundamental Confounds
Causal and non-causal (backdoor) paths
How to select variables for inclusion to estimate total causal effect of \(X\) on \(Y\).
dagitty and ggdag packages, plus CalvinBayes::gg_dag()

What a posterior sample is and how to use it
What the precis() summary tells us
Useful R functions: extract.sample(), link(), sim(), apply(), mean_hdi()
- also tibble(), mutate(), filter(), bind_rows(), bind_cols(), etc. to get things into the desired format.
Plots
- mcmc_ plots
- custom plots created using posterior samples, or link() or sim()
H(P)DI
How to interpret the (posterior distributions of) the parameters in the models we have seen
What link() and sim() do (and how they differ)