Main Topics
Generalized Linear Model (GLM) framework
Taxonomy of Variables
- Explanatory (predictor) and response (predicted)
- metric, count, dichotomous, nominal, ordinal
Building blocks
- response = “average” + “noise”; “average” is a linear function of the predictors \[ \begin{align*} \mathrm{link}(\mu) &= \mathrm{lin}(x); & y &\sim {\sf Dist}(\mu, \mbox{other parameters}) \\ \mu &= \mathrm{inv link}(\mathrm{lin}(x)); & y &\sim {\sf Dist}(\mu, \mbox{other parameters}) \end{align*} \]
- recoding predictors
- dichotomous variables as indicator (0/1) variables
- nominal variables as multiple indicator variables
- interaction terms
- distributions for response (family, “noise”)
- link functions
- logit link for dichotomous response (others possible: probit, robit, etc.)
- log link for Poisson (count response)
- log link for heterogeneous variances
- identity link
- transforming variables (response or predictors – how and why)
- hierarchical models (eg, fields in the fertilizer/tilling study)
- paired comparisons vs group-wise comparisons (eg, extra sleep ~ drug case study)
Distributions for Priors and Likelihoods
Commonly used distributions
- Beta, Uniform, Triangle, Normal, T, Bernoulli, Binomial, Gamma, Exponential, Poisson
- parameterizations (in R/JAGS/Stan).
Selecting priors
- how to choose shape and how to choose parameters (and explain your choice)
- “uninformative” / “weakly informative” priors
- improper uniform priors (commonly used default for
brm()
) - R functions for working with distributions:
gf_dist()
;beta_params()
;gamma_params()
;dnorm()
,pnorm()
,qnorm()
,rnorm()
, and similar for other distributions.
If you use the word skewed, be sure you use it correctly.
Using Stan via the brms package
- High level understanding of HMC algorithm (that’s the algorithm Stan uses) and why it works better than Gibbs sampling in some situations.
brm()
- formulas to describe models
- setting priors (
set_prior()
), inspecting priors (prior_summary()
) - family and link functions (eg,
family = bernoulli(link = logit))
) - data (and how
brm()
creates new variables for you) - extracting information from a brmsfit object:
stancode()
,stanfit()
,posterior()
, etc. - hierarchical models (eg,
(1 | Field)
in fertilizer/tilling study) - setting number of iterations, chains, etc.
- Using
update()
can make fitting subsequent models faster by avoiding the compile step. - Diagnostics (how to tell whether Stan seems to be working well)
Posterior Distributions
- H(P)DI
- what ROPE stands for
- how/why to use ROPE
- how to interpret the (posterior distributions of) the parameters in the models we have seen
- contrasts and other quantities derived from model parameters/coefficients
- useful R functions:
jags()
,sampling()
,as.mcmc()
,as.mcmc.list()
,posterior()
,mutate()
,hypothesis()
,hdi()
,plot_post()
,marginal_effects()
, and various other plotting functions.
Comparing models
- in-sample vs out-of-sample predictive accuracy
- elpd and its approximations (especially WAIC and LOO)
loo()
,waic()
,compare()
,loo_compare()