Learn how and why to create models with “interaction terms”.
Learn how to interpret models with interation.
Learn how to get ulam()
to compute derived parameters.
Review our Bayesian workflow: selecting priors, checking our priors, using link()
and sim()
(including a new idea)
This material is based on Rethinking 8
Every model so far in this book has assumed that each predictor has an independent association with the mean of the outcome. What if we want to allow the association to be conditional?…
To model deeper conditionality–where the importance of one predictor depends upon another predictor–we need interaction (also known as moderation). Interaction is a kind of conditioning, a way of allowing parameters (really their posterior distributions) to be conditional on further aspects of the data. The simplest kind of interaction, a linear interaction, is built by extending the linear modeling strategy to parameters within the linear model. So it is akin to placing epicycles on epicycles in the Ptolemaic and Kopernikan models. It is descriptive, but very powerful.
More generally, interactions are central to most statistical models beyond the cozy world of Gaussian outcomes and linear models of the mean. In generalized linear models (GLMs, Chapter 10 and onwards), even when one does not explicitly define variables as interacting, they will always interact to some degree. Multilevel models induce similar effects. [@mcelreathStatisticalRethinkingBayesian2020, p. 238, **emphasis** in the original]
data(rugged, package = "rethinking")
# make the log version of criterion
Rugged <-
rugged %>%
mutate(
log_gdp = log(rgdppc_2000),
log_gdp_std = log_gdp / mean(log_gdp, na.rm = TRUE),
rugged_std = rugged / max(rugged, na.rm = TRUE),
rugged_std_c = rugged_std - mean(rugged_std, na.rm = TRUE)
)
RuggedGDP <-
Rugged %>%
filter(! is.na(rgdppc_2000))
# sanity checks; these should be linear
gf_point(rugged_std ~ rugged, data = Rugged)
gf_point(log_gdp_std ~ log_gdp, data = Rugged)
RuggedGDP %>%
gf_point(log_gdp ~ rugged | cont_africa) %>%
gf_smooth(method = "lm", se = TRUE)
From the plot above, it appears the the direction of the association between ruggedness and GDP is different for African and non-African countries (and also that there is a lot of inter-nation variability, even among countries with similar ruggedness, so clearly terrain is no the only thing that influences GDP).
Our goal is to fit a model that can handle this sort of relationship. But first, some DAGs
library(ggdag)
dag_coords <-
tibble(name = c("R", "G", "C", "U"),
x = c(1, 2, 3, 2),
y = c(2, 2, 2, 1))
dagify(R ~ U,
G ~ R + U + C,
coords = dag_coords) %>%
gg_dag()
Let’s ignore \(U\) for now… Focus instead on the implication that \(R\) and \(C\) both influence \(G\). This could mean that they are independent influences or rather that they interact (one moderates the influence of the other). The DAG does not display an interaction. That’s because DAGs do not specify how variables combine to influence other variables. The DAG above implies only that there is some function that uses \(R\) and \(C\) to generate \(G\). In typical notation, \(G = f(R, C)\). (p. 240)
It’s generally not a good idea to split up your data and run separate analyses when examining an interaction. McElreath listed four reasons why:
“There are usually some parameters, such as \(\sigma\), that the model says do not depend in any way upon continent. By splitting the data table, you are hurting the accuracy of the estimates for these parameters” (p. 241).
“In order to acquire probability statements about the variable you used to split the data, cont_africa
, in this case, you need to include it in the model” (p. 241).
“We many want to use information criteria or another method to compare models” (p. 241).
“Once you begin using multilevel models ([Chapter 13][Models With Memory]), you’ll see that there are advantages to borrowing information across categories like ‘Africa’ and ‘not Africa’” (p. 241).
Here’s the DAG for a fuller model for the data.
dag_coords <-
tibble(name = c("G", "R", "H", "C", "U"),
x = c(1, 1.5, 2.5, 3.5, 1),
y = c(3, 2, 2, 2, 1))
dagify(G ~ R + U + H,
R ~ U,
H ~ R + U + C,
coords = dag_coords) %>%
gg_dag()
“The data contain a large number of potential confounds that you might consider. Natural systems like this are terrifyingly complex” (p. 241). In the words of the great Dan Simpson, “Pictures and fear–this is what we do [in statistics]; we draw pictures and have fear” (see here).
Our first Bayesian model will follow the form
\[\begin{align*} \text{log_gdp_std}_i & \sim \operatorname{Normal}(\mu_i, \sigma) \\ \mu_i & = \alpha + \beta \left (\text{rugged_std}_i - \overline{\text{rugged_std}} \right ) \\ \alpha & \sim \operatorname{Normal}(1, 1) \\ \beta & \sim \operatorname{Normal}(0, 1) \\ \sigma & \sim \operatorname{Exponential}(1). \end{align*}\]
Now fit the model.
u8.1 <-
ulam(
data = RuggedGDP %>% select(log_gdp_std, rugged_std_c),
alist(
log_gdp_std ~ dnorm(mu, sigma),
mu <- a + b * rugged_std_c,
a ~ dnorm(1, 1),
b ~ dnorm(0, 1),
sigma ~ dexp(1)
),
iter = 2000, warmup = 1000, chains = 4, cores = 4,
seed = 8,
file = "fits/u8.1")
And look at some consequences of our prior – which is clearly too vague.
Prior8.1 <- extract.prior(u8.1)
## Trying to compile a simple C file
## Running /Library/Frameworks/R.framework/Resources/bin/R CMD SHLIB foo.c
## clang -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG -I"/Library/Frameworks/R.framework/Versions/4.0/Resources/library/Rcpp/include/" -I"/Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/" -I"/Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/unsupported" -I"/Library/Frameworks/R.framework/Versions/4.0/Resources/library/BH/include" -I"/Library/Frameworks/R.framework/Versions/4.0/Resources/library/StanHeaders/include/src/" -I"/Library/Frameworks/R.framework/Versions/4.0/Resources/library/StanHeaders/include/" -I"/Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppParallel/include/" -I"/Library/Frameworks/R.framework/Versions/4.0/Resources/library/rstan/include" -DEIGEN_NO_DEBUG -DBOOST_DISABLE_ASSERTS -DBOOST_PENDING_INTEGER_LOG2_HPP -DSTAN_THREADS -DBOOST_NO_AUTO_PTR -include '/Library/Frameworks/R.framework/Versions/4.0/Resources/library/StanHeaders/include/stan/math/prim/mat/fun/Eigen.hpp' -D_REENTRANT -DRCPP_PARALLEL_USE_TBB=1 -I/usr/local/include -fPIC -Wall -g -O2 -c foo.c -o foo.o
## In file included from <built-in>:1:
## In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/StanHeaders/include/stan/math/prim/mat/fun/Eigen.hpp:13:
## In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/Dense:1:
## In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/Core:88:
## /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/src/Core/util/Macros.h:628:1: error: unknown type name 'namespace'
## namespace Eigen {
## ^
## /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/src/Core/util/Macros.h:628:16: error: expected ';' after top level declarator
## namespace Eigen {
## ^
## ;
## In file included from <built-in>:1:
## In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/StanHeaders/include/stan/math/prim/mat/fun/Eigen.hpp:13:
## In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/Dense:1:
## /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/Core:96:10: fatal error: 'complex' file not found
## #include <complex>
## ^~~~~~~~~
## 3 errors generated.
## make: *** [foo.o] Error 1
##
## SAMPLING FOR MODEL '1f654b5624c5d7f323bbc4b6a92eef56' NOW (CHAIN 1).
## Chain 1:
## Chain 1: Gradient evaluation took 3.5e-05 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.35 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1:
## Chain 1:
## Chain 1: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 1: Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 1: Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 1: Iteration: 600 / 2000 [ 30%] (Warmup)
## Chain 1: Iteration: 800 / 2000 [ 40%] (Warmup)
## Chain 1: Iteration: 1000 / 2000 [ 50%] (Warmup)
## Chain 1: Iteration: 1001 / 2000 [ 50%] (Sampling)
## Chain 1: Iteration: 1200 / 2000 [ 60%] (Sampling)
## Chain 1: Iteration: 1400 / 2000 [ 70%] (Sampling)
## Chain 1: Iteration: 1600 / 2000 [ 80%] (Sampling)
## Chain 1: Iteration: 1800 / 2000 [ 90%] (Sampling)
## Chain 1: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 1:
## Chain 1: Elapsed Time: 0.053438 seconds (Warm-up)
## Chain 1: 0.04294 seconds (Sampling)
## Chain 1: 0.096378 seconds (Total)
## Chain 1:
Prior8.1d <- Prior8.1 %>% as.data.frame()
set.seed(8)
Prior8.1d %>% slice_sample(n = 50) %>%
expand(nesting(a, b),
rugged_std_c = c(-2, 2)) %>%
mutate(log_gdp_std = a + b * rugged_std_c,
rugged_std = rugged_std_c + mean(RuggedGDP$rugged_std)) %>%
gf_abline(slope = ~ b, intercept = ~a) %>%
# book adds this line, but it isn't really needed
# gf_abline(intercept = ~1.3, slope = ~-0.6, data = NA,
# color = "steelblue", size = 2, alpha = 0.8) %>%
gf_hline( yintercept = ~ max, color = "red", linetype = "dashed",
data = df_stats(~log_gdp_std, data = RuggedGDP)) %>%
gf_hline( yintercept = ~ min, color = "red", linetype = "dashed",
data = df_stats(~log_gdp_std, data = RuggedGDP)) %>%
gf_lims(x = c(0, 1), y = c(0.5, 1.5)) %>%
gf_labs(
subtitle = "Intercept ~ dnorm(1, 1)\nb ~ dnorm(0, 1)",
x = "ruggedness",
y = "log GDP (prop of mean)")
Let’s tighten the prior. Reasonable slopes should have an absolute value less than \((1.3 - 0.7) / 1 = 0.6\) since any line with a slope greater than that will go outside our red dashed lines. Similarly, the intercept should be closer to 1.
\[\begin{align*} \text{log_gdp_std}_i & \sim \operatorname{Normal} (\mu_i, \sigma) \\ \mu_i & = \alpha + \beta \left (\text{rugged_std}_i - \overline{\text{rugged_std}} \right ) \\ \alpha & \sim \operatorname{Normal}(1, 0.1) \\ \beta & \sim \operatorname{Normal}(0, 0.3) \\ \sigma & \sim \operatorname{Exponential}(1). \end{align*}\]
u8.1b <-
ulam(
data = RuggedGDP %>% select(log_gdp_std, rugged_std_c),
alist(
log_gdp_std ~ dnorm(mu, sigma),
mu <- a + b * rugged_std_c,
a ~ dnorm(1, 0.1),
b ~ dnorm(0, 0.3),
sigma ~ dexp(1)
),
iter = 2000, warmup = 1000, chains = 4, cores = 4,
seed = 8,
file = "fits/u8.1b")
Prior8.1b <- extract.prior(u8.1b)
## Trying to compile a simple C file
## Running /Library/Frameworks/R.framework/Resources/bin/R CMD SHLIB foo.c
## clang -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG -I"/Library/Frameworks/R.framework/Versions/4.0/Resources/library/Rcpp/include/" -I"/Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/" -I"/Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/unsupported" -I"/Library/Frameworks/R.framework/Versions/4.0/Resources/library/BH/include" -I"/Library/Frameworks/R.framework/Versions/4.0/Resources/library/StanHeaders/include/src/" -I"/Library/Frameworks/R.framework/Versions/4.0/Resources/library/StanHeaders/include/" -I"/Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppParallel/include/" -I"/Library/Frameworks/R.framework/Versions/4.0/Resources/library/rstan/include" -DEIGEN_NO_DEBUG -DBOOST_DISABLE_ASSERTS -DBOOST_PENDING_INTEGER_LOG2_HPP -DSTAN_THREADS -DBOOST_NO_AUTO_PTR -include '/Library/Frameworks/R.framework/Versions/4.0/Resources/library/StanHeaders/include/stan/math/prim/mat/fun/Eigen.hpp' -D_REENTRANT -DRCPP_PARALLEL_USE_TBB=1 -I/usr/local/include -fPIC -Wall -g -O2 -c foo.c -o foo.o
## In file included from <built-in>:1:
## In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/StanHeaders/include/stan/math/prim/mat/fun/Eigen.hpp:13:
## In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/Dense:1:
## In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/Core:88:
## /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/src/Core/util/Macros.h:628:1: error: unknown type name 'namespace'
## namespace Eigen {
## ^
## /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/src/Core/util/Macros.h:628:16: error: expected ';' after top level declarator
## namespace Eigen {
## ^
## ;
## In file included from <built-in>:1:
## In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/StanHeaders/include/stan/math/prim/mat/fun/Eigen.hpp:13:
## In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/Dense:1:
## /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/Core:96:10: fatal error: 'complex' file not found
## #include <complex>
## ^~~~~~~~~
## 3 errors generated.
## make: *** [foo.o] Error 1
##
## SAMPLING FOR MODEL '15554dfe8b5a4011080e6c5e9a2d3da4' NOW (CHAIN 1).
## Chain 1:
## Chain 1: Gradient evaluation took 2.7e-05 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.27 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1:
## Chain 1:
## Chain 1: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 1: Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 1: Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 1: Iteration: 600 / 2000 [ 30%] (Warmup)
## Chain 1: Iteration: 800 / 2000 [ 40%] (Warmup)
## Chain 1: Iteration: 1000 / 2000 [ 50%] (Warmup)
## Chain 1: Iteration: 1001 / 2000 [ 50%] (Sampling)
## Chain 1: Iteration: 1200 / 2000 [ 60%] (Sampling)
## Chain 1: Iteration: 1400 / 2000 [ 70%] (Sampling)
## Chain 1: Iteration: 1600 / 2000 [ 80%] (Sampling)
## Chain 1: Iteration: 1800 / 2000 [ 90%] (Sampling)
## Chain 1: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 1:
## Chain 1: Elapsed Time: 0.055591 seconds (Warm-up)
## Chain 1: 0.037052 seconds (Sampling)
## Chain 1: 0.092643 seconds (Total)
## Chain 1:
Prior8.1bd <- Prior8.1b %>% as.data.frame()
set.seed(8)
last_plot() |
Prior8.1bd %>% slice_sample(n = 50) %>%
expand(nesting(a, b),
rugged_std_c = c(-2, 2)) %>%
mutate(log_gdp_std = a + b * rugged_std_c,
rugged_std = rugged_std_c + mean(RuggedGDP$rugged_std)) %>%
gf_abline(slope = ~ b, intercept = ~a) %>%
gf_hline( yintercept = ~ max, color = "red", linetype = "dashed",
data = df_stats(~log_gdp_std, data = RuggedGDP)) %>%
gf_hline( yintercept = ~ min, color = "red", linetype = "dashed",
data = df_stats(~log_gdp_std, data = RuggedGDP)) %>%
gf_lims(x = c(0, 1), y = c(0.5, 1.5)) %>%
gf_labs(
subtitle = "Intercept ~ dnorm(1, 1)\nb ~ dnorm(0, 1)",
x = "ruggedness",
y = "log GDP (prop of mean)")
precis(u8.1b)
## mean sd 5.5% 94.5% n_eff Rhat4
## a 0.99977808 0.1038403 0.83165813 1.1619360 2768.859 1.0004632
## b -0.00491654 0.2988620 -0.49934367 0.4669132 3019.639 1.0001908
## sigma 0.98934519 0.9388906 0.06508997 2.7950273 4096.693 0.9999293
mcmc_areas(stanfit(u8.1b), pars = vars(-lp__))
Note: there is a lack of symmetry here because we have not centered ruggedness. That’s a deficiency here, but we’ll be moving on to different models in a moment anyway, so will stick with what matches the text.
We already know how to let the intercept vary by region:
RuggedGDP <-
RuggedGDP %>%
mutate(cid = ifelse(cont_africa == 1, 1, 2))
In case you were curious, here’s a plot showing how the cid
index works.
RuggedGDP %>%
mutate(cid = paste("cid:", cid)) %>%
arrange(cid, country) %>%
group_by(cid) %>%
mutate(rank = 1:n()) %>%
gf_text(rank ~ cid | cid, label = ~country, hjust = 0, size = 2) %>%
gf_refine(
scale_y_reverse(),
theme_void()
)
u8.2 <-
ulam(
data = RuggedGDP %>% select(log_gdp_std, rugged_std_c, cid),
alist(
log_gdp_std ~ dnorm(mu, sigma),
mu <- a[cid] + b * rugged_std_c,
gq> a_diff <- a[1] - a[2], # compute a derived parameter!
a[cid] ~ dnorm(1, 0.1),
b ~ dnorm(0, 0.3),
sigma ~ dexp(1)
),
iter = 2000, warmup = 1000, chains = 4, cores = 4,
seed = 8,
file = "fits/u8.2")
precis(u8.2, depth = 2)
## mean sd 5.5% 94.5% n_eff Rhat4
## a[1] 0.88033770 0.015830798 0.8552048 0.90608402 4987.138 1.0000169
## a[2] 1.04898385 0.010403173 1.0322001 1.06541641 5111.589 1.0004219
## b -0.05871445 0.058522536 -0.1526484 0.03449917 4695.164 0.9991188
## sigma 0.11426592 0.006432857 0.1045315 0.12502328 4221.346 1.0012276
## a_diff -0.16864615 0.018953387 -0.1992293 -0.13806003 5190.350 0.9996134
mcmc_areas(stanfit(u8.2), pars = vars(-lp__))
We can do the same thing to get varying slopes
u8.3 <-
ulam(
data = RuggedGDP %>% select(log_gdp_std, rugged_std_c, cid),
alist(
log_gdp_std ~ dnorm(mu, sigma),
mu <- a[cid] + b[cid] * rugged_std_c,
gq> a_diff <- a[1] - a[2], # compute a derived parameter!
gq> b_diff <- b[1] - b[2], # compute a derived parameter!
a[cid] ~ dnorm(1, 0.1),
b[cid] ~ dnorm(0, 0.3),
sigma ~ dexp(1)
),
iter = 2000, warmup = 1000, chains = 4, cores = 4,
seed = 8,
file = "fits/u8.3")
precis(u8.3, depth = 2)
## mean sd 5.5% 94.5% n_eff Rhat4
## a[1] 0.8875654 0.016362182 0.86137357 0.9135736 5745.755 0.9998161
## a[2] 1.0495356 0.010118417 1.03343968 1.0658389 5643.773 0.9998597
## b[1] 0.1609868 0.095189786 0.00890525 0.3136438 6101.538 0.9994434
## b[2] -0.1771275 0.069909247 -0.28822200 -0.0660444 5450.749 0.9993388
## sigma 0.1114961 0.006270556 0.10191524 0.1219611 5313.728 0.9996157
## b_diff 0.3381143 0.116964433 0.14677333 0.5259943 5919.085 0.9994361
## a_diff -0.1619702 0.019316649 -0.19186713 -0.1311510 5843.591 1.0000820
mcmc_areas(stanfit(u8.3), pars = vars(-lp__))
The models above all assume that the nation-to-nation variability about the average (which is a linear function of ruggedness) is the same for African and non-African countries. We can introduce varying \(\sigma\)s in the same way.
u8.3b <-
ulam(
data = RuggedGDP %>% select(log_gdp_std, rugged_std_c, cid),
alist(
log_gdp_std ~ dnorm(mu, sigma[cid]),
mu <- a[cid] + b[cid] * rugged_std_c,
gq> a_diff <- a[1] - a[2], # compute a derived parameter!
gq> b_diff <- b[1] - b[2], # compute a derived parameter!
gq> s_diff <- sigma[1] - sigma[2], # compute a derived parameter!
a[cid] ~ dnorm(1, 0.1),
b[cid] ~ dnorm(0, 0.3),
sigma[cid] ~ dexp(1)
),
iter = 2000, warmup = 1000, chains = 4, cores = 4,
seed = 8,
file = "fits/u8.3b")
precis(u8.3b, depth = 2)
## mean sd 5.5% 94.5% n_eff Rhat4
## a[1] 0.887106837 0.016088828 0.861899651 0.91342904 4244.542 0.9998226
## a[2] 1.049481253 0.010429383 1.032939926 1.06608680 5454.578 0.9993752
## b[1] 0.161106667 0.095468222 0.009860025 0.31113545 3899.510 0.9996934
## b[2] -0.172934957 0.072716120 -0.290041963 -0.05317955 4602.561 0.9995856
## sigma[1] 0.109875248 0.011968130 0.092457451 0.13053850 4503.385 0.9996408
## sigma[2] 0.113468339 0.007502963 0.102264191 0.12588803 3942.521 0.9999287
## s_diff -0.003593091 0.014073450 -0.024964941 0.01932785 4310.620 1.0000549
## b_diff 0.334041624 0.120885637 0.141881470 0.52792888 4050.770 0.9996594
## a_diff -0.162374416 0.018962655 -0.192007368 -0.13217686 4755.878 0.9997672
mcmc_areas(stanfit(u8.3b), pars = vars(-lp__))
In this case, the data don’t suggest that allowing for different \(\sigma\)’s is important, and in general we don’t want to split up all the parameters if it isn’t necessary. The point here is just that it is no harder to do this for other types of parameters.
And, of course, it would work with any number of groups as well.
Post8.3 <-
extract.samples(u8.3) %>%
data.frame()
head(Post8.3, 3)
## a.1 a.2 b.1 b.2 sigma b_diff a_diff
## 1 0.8837985 1.035231 0.1689051 -0.2754906 0.1044524 0.4443957 -0.1514322
## 2 0.8701813 1.042357 0.1108902 -0.1242925 0.1055921 0.2351828 -0.1721758
## 3 0.9117889 1.044288 0.1923169 -0.1063833 0.1168874 0.2987002 -0.1324993
CFD <-
expand_grid(cid = 1:2, rugged_std = seq(-0.1, 1.1, by = 0.2)) %>%
mutate(rugged_std_c = rugged_std - mean(RuggedGDP$rugged_std))
CFD %>% slice_sample(n=3)
## # A tibble: 3 x 3
## cid rugged_std rugged_std_c
## <int> <dbl> <dbl>
## 1 2 0.7 0.529
## 2 2 -0.1 -0.271
## 3 1 0.7 0.529
Avg8.3c <-
link(u8.3, data = CFD) %>%
apply(2, mean_hdi) %>%
bind_rows() %>%
bind_cols(CFD)
Avg8.3c
## y ymin ymax .width .point .interval cid rugged_std
## 1 0.8439895 0.7910591 0.8958151 0.95 mean hdi 1 -0.1
## 2 0.8761868 0.8445413 0.9071146 0.95 mean hdi 1 0.1
## 3 0.9083842 0.8672297 0.9562943 0.95 mean hdi 1 0.3
## 4 0.9405816 0.8643732 1.0141740 0.95 mean hdi 1 0.5
## 5 0.9727789 0.8643751 1.0844239 0.95 mean hdi 1 0.7
## 6 1.0049763 0.8582959 1.1518254 0.95 mean hdi 1 0.9
## 7 1.0371737 0.8562335 1.2232636 0.95 mean hdi 1 1.1
## 8 1.0974805 1.0538850 1.1397396 0.95 mean hdi 2 -0.1
## 9 1.0620550 1.0399795 1.0840209 0.95 mean hdi 2 0.1
## 10 1.0266295 0.9991856 1.0519326 0.95 mean hdi 2 0.3
## 11 0.9912040 0.9408471 1.0379198 0.95 mean hdi 2 0.5
## 12 0.9557785 0.8853339 1.0337376 0.95 mean hdi 2 0.7
## 13 0.9203530 0.8183743 1.0210440 0.95 mean hdi 2 0.9
## 14 0.8849275 0.7543599 1.0120794 0.95 mean hdi 2 1.1
## rugged_std_c
## 1 -0.27068011
## 2 -0.07068011
## 3 0.12931989
## 4 0.32931989
## 5 0.52931989
## 6 0.72931989
## 7 0.92931989
## 8 -0.27068011
## 9 -0.07068011
## 10 0.12931989
## 11 0.32931989
## 12 0.52931989
## 13 0.72931989
## 14 0.92931989
countries <-
c("Equatorial Guinea", "South Africa", "Seychelles", "Swaziland",
"Lesotho", "Rwanda", "Burundi", "Luxembourg", "Greece",
"Switzerland", "Lebanon", "Yemen", "Tajikistan", "Nepal")
p8.3a <-
Avg8.3c %>%
mutate(cid_fct = factor(cid, labels = c("Africa", "not Africa"))) %>%
gf_ribbon(ymin + ymax ~ rugged_std, group = ~ cid, fill = ~cid_fct) %>%
gf_line(y ~ rugged_std, group = ~ cid, color = ~ cid_fct) %>%
gf_point(log_gdp_std ~ rugged_std, data = RuggedGDP, inherit = FALSE) %>%
gf_text(log_gdp_std ~ rugged_std, label = ~ country, alpha = 0.6,
data = RuggedGDP %>% filter(country %in% countries), inherit = FALSE) %>%
gf_facet_grid( ~ cid) %>%
gf_labs(y = "expected relative log GDP")
p8.3a
LA <- link(u8.3, data = CFD %>% filter(cid == 1))
LN <- link(u8.3, data = CFD %>% filter(cid == 2))
dim(LA)
## [1] 4000 7
dim(LN)
## [1] 4000 7
Avg8.3cd <-
(LA - LN) %>% # subtract matrices to get difference in values of mu
apply(2, mean_hdi) %>% # now get HDI for those differences
bind_rows() %>%
bind_cols(CFD %>% filter(cid == 1) %>% select(-cid))
Avg8.3cd
## y ymin ymax .width .point .interval rugged_std
## 1 -0.25349102 -0.31772624 -0.18639427 0.95 mean hdi -0.1
## 2 -0.18586815 -0.22149419 -0.14603743 0.95 mean hdi 0.1
## 3 -0.11824528 -0.17057109 -0.06856547 0.95 mean hdi 0.3
## 4 -0.05062242 -0.14095649 0.03874057 0.95 mean hdi 0.5
## 5 0.01700045 -0.12228948 0.14449932 0.95 mean hdi 0.7
## 6 0.08462332 -0.09673829 0.26140209 0.95 mean hdi 0.9
## 7 0.15224619 -0.09433627 0.35609496 0.95 mean hdi 1.1
## rugged_std_c
## 1 -0.27068011
## 2 -0.07068011
## 3 0.12931989
## 4 0.32931989
## 5 0.52931989
## 6 0.72931989
## 7 0.92931989
p8.3b <-
Avg8.3cd %>%
gf_ribbon(ymin + ymax ~ rugged_std) %>%
gf_line(y ~ rugged_std) %>%
gf_labs(y = "expected difference in relative log GDP")
p8.3b
If you’re unfamiliar with Buridan’s ass (mentioned in the text), here’s a brief clip to catch up up to speed. With that ass still on your mind, recall the model for \(\mu_i\) from the last example,
The previous two plots illustrate a kind of symmetry in our model:
p8.3a
p8.3b
This is always the case. If there is interaction one way around, there is interaction the other way around.
McElreath goes on to show another way we can express this model. We’ve been thinking of it as
\[ \mu_i = \alpha_{\text{cid}[i]} + \beta_{\text{cid}[i]} R_i \;. \]
With this model, it is equally true that that slope is conditional on the intercept as it is that the intercept is conditional on the slope. Another way to express the model is
\[\begin{align*} \mu_i & = \underbrace{(2 - \text{cid}_{i}) \left (\alpha_1 + \beta_1 R_i \right )}_{\text{cid}[i] = 1} \\ & \;\;\; + \underbrace{(\text{cid}_{i} - 1) \left (\alpha_2 + \beta_2 R_i \right )}_{\text{cid}[i] = 2}, \end{align*}\]
where the first term vanishes when \(\text{cid}_i = 2\) and the second term vanishes when \(\text{cid}_i = 1\). In contrast to the plots above, we can re-express this equation as saying “The association of being in Africa with log GDP depends upon terrain ruggedness” (p. 251, emphasis in the original).
That’s true, although he doesn’t really make much use of that formulation.
He seems instead to focus on the difference:
\[ \underbrace{(\alpha_1 + \beta_1 R_i) - (\alpha_2 + \beta_2) R_i}_{ \mbox{difference b/w Af and non-Af GDP} }= \underbrace{(\alpha_1 - \alpha_2)}_{\mbox{when }R_i = 0} + \underbrace{(\beta_1 - \beta_2)}_{\mbox{how difference depends on } R_i} R_i \]
This perspective on the GDP and terrain ruggedness is completely consistent with the previous perspective. It’s simultaneously true in these data (and with this model) that (1) the influence of ruggedness depends upon continent and (2) the influence of continent depends upon ruggedness.
Simple interactions are symmetric, just like the choice facing Buridan’s ass. Within the model, there’s no basis to prefer one interpretation over the other, because in fact they are the same interpretation. But when we reason causally about models, our minds tend to prefer one interpretation over the other, because it’s usually easier to imagine manipulating one of the predictor variables instead of the other. (pp. 251–252)
In any case, this sort of algebraic rearrangement can be very useful
Instead of coding continent as 1 or 2 and using that as an index, let’s code it as \(C =\) 0 or 1 and see how that affects our model.
\[\begin{align*} \mu_G &= \fbox{intercept} + \fbox{slope} R \\ &= \fbox{$\alpha_0 + \alpha_C C$} + \fbox{$\beta_0 +\beta_C C$} R \\ &= \alpha_0 + \alpha_C C + \beta_0 R + \beta_C C\cdot R \\ &= \beta_0 + \beta_C C + \beta_R R + \beta_{CR} C\cdot R \\ \end{align*}\]
The last line renames the parameters to a more traditional labeling – \(\beta_0\) has a different meaning in teh last line than it had in the previous two lines.
At first this seems equivalent to what we did before. For example, we can write expression for the mean response in each group and see that we have lines with varying slopes and intercepts, just as before.
\(C=0\) | \(C = 1\) |
---|---|
\(\beta_0 + \beta_R R\) | \((\beta_0 + \beta_C) + (\beta_R R + \beta_{CR}) R\) |
But there are three disadvantages to this approach.
It is harder to create good priors.
What sort of distribution should the “adjustments” have? And how does that determine the prior for the slope and intercept when \(C = 1\)?
This model presumes that there is more uncertainty in the slope and in the intercept when \(C = 1\) than when \(C = 0\).
When \(C = 1\), we get all the uncertainty from the parameters invovled when \(C=0\), plus additional uncertainty from the adjustment parameters.
The index variable approach sets us up better for hiearchical models (which are coming soon).
But you should be familiar with both approaches, and you will see the indicator variable (or dummy variable if there are more than two categories) a lot, especially in non-Bayesian analyses. One small advantage of the second approach is that it provides a nice short-hand notation for our model:
1 + C + R + C:R
= 1 + C * R
Note that the interaction is denoted by :
and *
represents the interaction plus both “main effects”.