Logistics

Topics

In addition to the topics for Test 1 that are still needed, here is a list of topics for Test 2.

(Multiple) Regression Models

Most of the models we have seen recently are in the category of multiple regression models.

  • These are part of a larger framework known as Generalized Linear Models – GLMs.

  • When we consider binary response variables, those are examples of GLMs (because the distribution of the response is something other than normal).

Taxonomy of Variables

  • Explanatory (predictor) vs. response (predicted, outcome)

  • Numerical vs categorical

    • We’ve dealt mainly with metric numerical variables, but will eventually see some other types of numeric variables, like count data

    • Binary/dichotomous is a special case of categorical (and sometimes a bit easier to deal with), but you should also be able to handle categorical variables with more than two levels.

  • Converting categorical variables to numbers

    • index variables (starting with 1)
    • indicator variables (0/1)
    • dummy variables (multiple indicator variables used together)
    • reasons to prefer index variables to indicator/dummy variables (most of the time)

response = “average” + “noise”

  • Most of our models have been based on \(y \sim {\sf Norm}(\mu, \sigma)\) with some formula relating \(\mu\) to our explanatory variables.

    • This is not a requirement of Bayesian modeling, and we will start to see some other things soon.

    • We have also seen a few models that have a binary response (so they use a binomial distribution for the response instead of a normal distribution).

  • In the “normal response” context, \(\mu\) = “average” and \(\sigma\) quantifies the noise.

    • Be able to interpret what \(\sigma\) tells you in the context of the model.
    • Be able to interpret what \(\mu\) tells you in the context of the model.
    • Be able to interpret what the parameters involved in the formula for \(\mu\) tell you in the context of the model.
    • In most of our models \(\sigma\) has been the same for all observations, but this is not required. (See brief section on "multiple \(\sigma\)’s for an example.)
  • A few of our models have been based on \(y \sim {\sf Binom}(n, p)\).

    • average becomes proportion in these models
  • Take advantage of algebra to help figure out (a) what model you want or (b) what a given model is doing.

Designing models

  • Specifying a model with formulas and/or computer code (quap() and ulam());
  • Interpreting parameters in the model (including derived parameters like a difference in means)
  • Selecting priors for parameters in the model
  • Prior predictive checks
  • Centering/standardization (how and why)
  • Other transforamtions (e.g., log)
  • Deciding which variables to include as predictors
  • Interaction (what, when, how, and interpretation)

Distributions for Priors and Likelihoods

Commonly used distributions

  • Main ones we have used: Normal, Exponential, Log-normal, Uniform, Triangle, Gamma, Beta, Binomial

  • What the distribution parameters mean and how they effect things like priors.

    • Note: We have two contexts for the word parameter – parameters of our model and parameters of distributions. Sometimes we use a model parameter (or combination of model parameters) to specify a distribution parameter. Sometimes (in our priors, for example) we set distribution parameters to numbers.
  • If you use the word ‘skewed’, be sure you use it correctly.

Selecting priors

  • How to choose shape (family) and how to choose parameters (and explain your choice).

  • Prior predictive checks to understand priors and make sure they are reasonable

  • R functions for working with distributions: gf_dist(); beta_params(); gamma_params(); dnorm(), pnorm(), qnorm(), rnorm(), and similar for other distributions.

MCMC Algorithms

  • How MCMC algorithms sample from the posterior

  • Metropolis Algorithm

    • King Markov story and how it relates to use of Metropolis algorithm in Bayesian modeling
    • Jump rules, proposal acceptance, etc.
  • Hamiltonian Markov Chain

    • Big ideas behind that algorithm
    • Why it is often better than the basic Metropolis algorithm
  • Fitting models with Stan via ulam()

  • Basic diagnostics

    • effective sample size
    • Rhat
    • trace plots
  • Preparing data for Stan

    • remove missing values (including missing values in variables not used in the model!)

Causal DAGs

  • Fundamental Confounds

  • Causal and non-causal (backdoor) paths

  • How to select variables for inclusion to estimate total causal effect of \(X\) on \(Y\).

  • dagitty and ggdag packages, plus CalvinBayes::gg_dag()

Posterior Sampling

  • What a posterior sample is and how to use it

  • What the precis() summary tells us

  • Useful R functions: extract.sample(), link(), sim(), apply(), mean_hdi()

    • also tibble(), mutate(), filter(), bind_rows(), bind_cols(), etc. to get things into the desired format.
  • Plots

    • mcmc_ plots
    • custom plots created using posterior samples, or link() or sim()
  • H(P)DI

  • How to interpret the (posterior distributions of) the parameters in the models we have seen

  • What link() and sim() do (and how they differ)