Learn how and why to create models with “interaction terms” with two numerical variables.
Learn how to interpret models with interaction.
This material is based on Rethinking 8
The previous example was particularly simple because there were only two possible values for one of the our predictors. We would like to generalize the concept of interaction to include
Look at the tulips
data, which were adapted from @grafenModernStatisticsLife2002.
Prep the data
data(tulips, package = "rethinking")
Tulips <-
tulips %>%
mutate(
blooms_std = blooms / max(blooms),
water_cent = water - mean(water),
shade_cent = shade - mean(shade))
With the variables in hand, the basic model is
\[
B = f(W, S) \;,
\] where \(B\) = blooms
, \(W\) = water
, \(S\) = shade
, and \(f(\cdot)\) indicates a function. We can also express this as \(W \rightarrow B \leftarrow S\). Neither expression clarifies whether the effects of \(W\) and \(S\) are additive or conditional on each other in some way. We might express an unconditional (additive) model as
\[\begin{align*} \text{blooms_std}_i & \sim \operatorname{Normal}(\mu_i, \sigma) \\ \mu_i & = \alpha + \beta_1 \text{water_cent}_i + \beta_2 \text{shade_cent}_i \\ \alpha & \sim \operatorname{Normal}(0.5, 1) \\ \beta_1 & \sim \operatorname{Normal}(0, 1) \\ \beta_2 & \sim \operatorname{Normal}(0, 1) \\ \sigma & \sim \operatorname{Exponential}(1), \end{align*}\]
where \(\text{water_cent}_i = \left (\text{water}_i - \overline{\text{water}} \right )\) and \(\text{shade_cent}_i = \left (\text{shade}_i - \overline{\text{shade}} \right )\).
Even though “the intercept \(\alpha\) must be greater than zero and less than one, … this prior assigns most of the probability outside that range” (p. 254).
If we want 95% of a normal distribution to live on the interval \([0,1]\), then we could choose a mean of 0.5 and a standard deviation of 0.25.
pnorm(1, 0.5, 1) - pnorm(0, 0.5, 1)
## [1] 0.3829249
pnorm(1, 0.5, 0.25) - pnorm(0, 0.5, 0.25)
## [1] 0.9544997
Our two predictors take on values 1, 2, or 3 with equal probability. These are rescaled to take values -1, 0, or 1.
gf_histogram( ~ water, data = Tulips)
gf_histogram( ~ shade, data = Tulips)
range(Tulips$water_cent)
## [1] -1 1
range(Tulips$shade_cent)
## [1] -1 1
Although the experiment only used 3 values for each of these, it is possible to supply intermediate amounts of water or shade, and our model will be able to make predictions for those if we treat the predictors as numerical variables.
For priors on the slopes, we can think about a very strong effect of one of these preditors. A slope of 0.5 would move our predicted mean from the smallest observed blooms_std
to the largest. Slopes larger than that would be quite extreme. So again, a standard deviation of 0.25 seems reasonable.
\[\begin{align*} \text{blooms_std}_i & \sim \operatorname{Normal}(\mu_i, \sigma) \\ \mu_i & = \alpha + \beta_1 \text{water_cent}_i + \beta_2 \text{shade_cent}_i \\ \alpha & \sim \operatorname{Normal}(0.5, 0.25) \\ \beta_1 & \sim \operatorname{Normal}(0, 0.25) \\ \beta_2 & \sim \operatorname{Normal}(0, 0.25) \\ \sigma & \sim \operatorname{Exponential}(1). \end{align*}\]
u8.4 <-
ulam(
data = Tulips,
alist(
blooms_std ~ dnorm(mu, sigma),
mu <- a + bw * water_cent + bs * shade_cent,
a ~ dnorm(0, 0.25),
c(bw, bs) ~ dnorm(0, 0.25),
sigma ~ dexp(1)
),
iter = 2000, warmup = 1000, chains = 4, cores = 4,
seed = 8,
file = "fits/u8.4")
Check the model summary.
precis(u8.4)
## mean sd 5.5% 94.5% n_eff Rhat4
## a 0.3503720 0.03502050 0.2933151 0.40635720 4694.069 0.9998423
## bs -0.1119084 0.04181169 -0.1783555 -0.04543695 4475.947 1.0000046
## bw 0.2036052 0.04093695 0.1370650 0.26923754 4425.063 0.9994021
## sigma 0.1766912 0.02733281 0.1392921 0.22430748 3558.581 0.9996895
stanfit(u8.4) %>% mcmc_areas(pars = vars(-lp__))
So what does interaction mean here? We want to modify
\[ \mu_i = \beta_0 + \beta_W W_i + \beta_S S_i \\ \] to express that the effect of water on blossoms (\(\beta_W\)) should depend (linearly for now) on the amount of shade (\(S\)). But we know how to express something that depends (linearly) on the amount of shade: \(\beta_0 + \beta_S S\). So let’s do a little renaming and insert that into our model:
\[\begin{align*} \mu_i &= \beta_0 + (\beta_W + \beta_{SW} S_i) W_i + \beta_S S_i \\ &= \beta_0 + \beta_W W_i + \beta_{SW} S_i W_i + \beta_S S_i \\ &= \beta_0 + \beta_W W_i + (\beta_S + \beta_{SW} W_i) S_i \end{align*}\]
The algebraic rearrangement (a) explains our choice of subscript notation,
(b) shows the form in which this sort of interaction model is often predicted, (c) makes it clear that this is symmetric in \(S\) and \(W\), and (d) helps us understand what the parameters represent and how we might choose priors.
Note: McElreath also introduces another notation:
These are the parenthesized expressions in the equations above.
Using the \(\gamma\) notation, we can express an interaction between water_cent
and shade_cent
by
\[\begin{align*} \mu_i & = \alpha + \color{#009E73}{\gamma_{1, i}} \text{water_cent}_i + \beta_2 \text{shade_cent}_i \\ \color{#009E73}{\gamma_{1, i}} & \color{#009E73}= \color{#009E73}{\beta_1 + \beta_3 \text{shade_cent}_i}, \end{align*}\]
where both \(\mu_i\) and \(\gamma_{1, i}\) get a linear model. We could do the converse by switching the positions of water_cent
and shade_cent
. If we substitute the equation for \(\gamma_{1, i}\) into the equation for \(\mu_i\), we get
\[\begin{align*} \mu_i & = \alpha + \color{#009E73}{\underbrace{(\beta_1 + \beta_3 \text{shade_cent}_i)}_{\gamma_{1, i}}} \text{water_cent}_i + \beta_2 \text{shade_cent}_i \\ & = \alpha + \color{#009E73}{\beta_1} \text{water_cent}_i + (\color{#009E73}{\beta_3 \text{shade_cent}_i} \cdot \text{water_cent}_i) + \beta_2 \text{shade_cent}_i \\ & = \alpha + \color{#009E73}{\beta_1} \text{water_cent}_i + \beta_2 \text{shade_cent}_i + \color{#009E73}{\beta_3} (\color{#009E73}{\text{shade_cent}_i} \cdot \text{water_cent}_i), \end{align*}\]
where \(\beta_3\) is the interaction term which makes water_cent
and shade_cent
conditional on each other. If we use the same priors as before, we might write the full equation for our interaction model as
\[\begin{align*} \text{blooms_std}_i & \sim \operatorname{Normal}(\mu_i, \sigma) \\ \mu_i & = \alpha + \color{#009E73}{\beta_1} \text{water_cent}_i + \beta_2 \text{shade_cent}_i + \color{#009E73}{\beta_3 \text{shade_cent}_i} \cdot \text{water_cent}_i\\ \beta_0 & \sim \operatorname{Normal}(0.5, 0.25) \\ \beta_S & \sim \operatorname{Normal}(0, 0.25) \\ \beta_W & \sim \operatorname{Normal}(0, 0.25) \\ \beta_{SW} & \sim \operatorname{Normal}(0, 0.25) \\ \sigma & \sim \operatorname{Exponential}(1). \end{align*}\]
Fit the model.
u8.5 <-
ulam(
data = Tulips,
alist(
blooms_std ~ dnorm(mu, sigma),
mu <- b0 + bw * water_cent + bs * shade_cent + bws * water_cent * shade_cent,
b0 ~ dnorm(0, 0.25),
c(bw, bs, bws) ~ dnorm(0, 0.25),
sigma ~ dexp(1)
),
iter = 2000, warmup = 1000, chains = 4, cores = 4,
seed = 8,
file = "fits/u8.5")
precis(u8.5)
## mean sd 5.5% 94.5% n_eff Rhat4
## b0 0.3522605 0.02780488 0.3073529 0.39693740 4444.975 1.0006375
## bws -0.1423041 0.04244937 -0.2094041 -0.07486695 4279.999 1.0003200
## bs -0.1122193 0.03474999 -0.1671495 -0.05603002 4351.081 0.9993499
## bw 0.2060801 0.03371968 0.1520878 0.25853715 4675.903 1.0014986
## sigma 0.1433543 0.02316433 0.1117857 0.18362052 3683.029 1.0003516
stanfit(u8.5) %>% mcmc_areas(pars = vars(-lp__))
Making a plot like figure 8.7 takes a little thinking to get it right. Let’s start by taking a look at the plot, and then figure out how to make it.
# this function converts a matrix to a 3-column data frame
# think: row, column, value
mat2df <- function(m, rname = "row", cname = "col", vname = "value") {
if (is.null(colnames(m))) {
colnames(m) <- 1:ncol(m)
}
if (is.null(rownames(m))) {
rownames(m) <- 1:nrow(m)
}
m %>%
as_tibble() %>%
tibble::rownames_to_column(var = rname) %>%
pivot_longer(! any_of(rname), names_to = cname, values_to = vname)
}
# example use
M <- matrix(1:12, nrow = 3)
M
## [,1] [,2] [,3] [,4]
## [1,] 1 4 7 10
## [2,] 2 5 8 11
## [3,] 3 6 9 12
M %>% mat2df(rname = "R", cname = "C", vname = "number")
## # A tibble: 12 x 3
## R C number
## <chr> <chr> <int>
## 1 1 1 1
## 2 1 2 4
## 3 1 3 7
## 4 1 4 10
## 5 2 1 2
## 6 2 2 5
## 7 2 3 8
## 8 2 4 11
## 9 3 1 3
## 10 3 2 6
## 11 3 3 9
## 12 3 4 12
CFD <-
expand_grid(
shade_cent = -1:1,
water_cent = -1:1
) %>%
mutate(scenario = as.character(1:n()))
L8.5 <-
link(u8.5, data = CFD, n = 20)
# a bug? -- we are getting 4000 rows but we asked for 20
str(L8.5)
## num [1:4000, 1:9] 0.142 0.158 0.1 0.193 0.1 ...
# but it is easy enough to grab 20 randomly
p8.5 <-
L8.5[sample(1:4000, 20), ] %>%
mat2df(cname = "scenario", vname = "predicted_blooms_std") %>%
left_join(CFD) %>%
gf_line(predicted_blooms_std ~ shade_cent, group = ~row,
color = "steelblue", alpha = 0.7) %>%
gf_facet_grid( ~ water_cent, labeller = label_both) %>%
gf_labs(
title = "Model 8.5",
subtitle = "posterior"
)
## Joining, by = "scenario"
L8.4 <-
link(u8.4, data = CFD, n = 20)
p8.4 <-
L8.4[sample(1:4000, 20), ] %>%
mat2df(cname = "scenario", vname = "predicted_blooms_std") %>%
left_join(CFD) %>%
gf_line(predicted_blooms_std ~ shade_cent, group = ~row,
color = "steelblue", alpha = 0.7) %>%
gf_facet_grid( ~ water_cent, labeller = label_both) %>%
gf_labs(
title = "Model 8.4",
subtitle = "posterior"
)
## Joining, by = "scenario"
p8.4 / p8.5
We can make the same sort of plots from the priors as well – and maybe should have done so earlier. We just provide link()
with a matrix of prior samples.
Unfortunately, I haven’t yet figured out a more efficient way to generate prior samples from an ulam object. This way requires that the Stan model be recompiled, which is unfortunate. I’m guessing this is because ulam()
hard codes in the number of rows in the data set (see Stan code below), so we can’t just send in empty data and resample. [You can do this in native Stan, but they you need to pass in the sample size as well as the data so that sample size can be used to declare the relevant variables.]
L8.5prior <-
link(u8.5, data = CFD, post = extract.prior(u8.5))
## Trying to compile a simple C file
## Running /Library/Frameworks/R.framework/Resources/bin/R CMD SHLIB foo.c
## clang -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG -I"/Library/Frameworks/R.framework/Versions/4.0/Resources/library/Rcpp/include/" -I"/Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/" -I"/Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/unsupported" -I"/Library/Frameworks/R.framework/Versions/4.0/Resources/library/BH/include" -I"/Library/Frameworks/R.framework/Versions/4.0/Resources/library/StanHeaders/include/src/" -I"/Library/Frameworks/R.framework/Versions/4.0/Resources/library/StanHeaders/include/" -I"/Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppParallel/include/" -I"/Library/Frameworks/R.framework/Versions/4.0/Resources/library/rstan/include" -DEIGEN_NO_DEBUG -DBOOST_DISABLE_ASSERTS -DBOOST_PENDING_INTEGER_LOG2_HPP -DSTAN_THREADS -DBOOST_NO_AUTO_PTR -include '/Library/Frameworks/R.framework/Versions/4.0/Resources/library/StanHeaders/include/stan/math/prim/mat/fun/Eigen.hpp' -D_REENTRANT -DRCPP_PARALLEL_USE_TBB=1 -I/usr/local/include -fPIC -Wall -g -O2 -c foo.c -o foo.o
## In file included from <built-in>:1:
## In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/StanHeaders/include/stan/math/prim/mat/fun/Eigen.hpp:13:
## In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/Dense:1:
## In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/Core:88:
## /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/src/Core/util/Macros.h:628:1: error: unknown type name 'namespace'
## namespace Eigen {
## ^
## /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/src/Core/util/Macros.h:628:16: error: expected ';' after top level declarator
## namespace Eigen {
## ^
## ;
## In file included from <built-in>:1:
## In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/StanHeaders/include/stan/math/prim/mat/fun/Eigen.hpp:13:
## In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/Dense:1:
## /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/Core:96:10: fatal error: 'complex' file not found
## #include <complex>
## ^~~~~~~~~
## 3 errors generated.
## make: *** [foo.o] Error 1
##
## SAMPLING FOR MODEL 'f3f053baef8ef4204d9c3c804be8bc80' NOW (CHAIN 1).
## Chain 1:
## Chain 1: Gradient evaluation took 2.6e-05 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.26 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1:
## Chain 1:
## Chain 1: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 1: Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 1: Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 1: Iteration: 600 / 2000 [ 30%] (Warmup)
## Chain 1: Iteration: 800 / 2000 [ 40%] (Warmup)
## Chain 1: Iteration: 1000 / 2000 [ 50%] (Warmup)
## Chain 1: Iteration: 1001 / 2000 [ 50%] (Sampling)
## Chain 1: Iteration: 1200 / 2000 [ 60%] (Sampling)
## Chain 1: Iteration: 1400 / 2000 [ 70%] (Sampling)
## Chain 1: Iteration: 1600 / 2000 [ 80%] (Sampling)
## Chain 1: Iteration: 1800 / 2000 [ 90%] (Sampling)
## Chain 1: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 1:
## Chain 1: Elapsed Time: 0.042763 seconds (Warm-up)
## Chain 1: 0.033694 seconds (Sampling)
## Chain 1: 0.076457 seconds (Total)
## Chain 1:
L8.4prior <-
link(u8.4, data = CFD, post = extract.prior(u8.4))
## Trying to compile a simple C file
## Running /Library/Frameworks/R.framework/Resources/bin/R CMD SHLIB foo.c
## clang -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG -I"/Library/Frameworks/R.framework/Versions/4.0/Resources/library/Rcpp/include/" -I"/Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/" -I"/Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/unsupported" -I"/Library/Frameworks/R.framework/Versions/4.0/Resources/library/BH/include" -I"/Library/Frameworks/R.framework/Versions/4.0/Resources/library/StanHeaders/include/src/" -I"/Library/Frameworks/R.framework/Versions/4.0/Resources/library/StanHeaders/include/" -I"/Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppParallel/include/" -I"/Library/Frameworks/R.framework/Versions/4.0/Resources/library/rstan/include" -DEIGEN_NO_DEBUG -DBOOST_DISABLE_ASSERTS -DBOOST_PENDING_INTEGER_LOG2_HPP -DSTAN_THREADS -DBOOST_NO_AUTO_PTR -include '/Library/Frameworks/R.framework/Versions/4.0/Resources/library/StanHeaders/include/stan/math/prim/mat/fun/Eigen.hpp' -D_REENTRANT -DRCPP_PARALLEL_USE_TBB=1 -I/usr/local/include -fPIC -Wall -g -O2 -c foo.c -o foo.o
## In file included from <built-in>:1:
## In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/StanHeaders/include/stan/math/prim/mat/fun/Eigen.hpp:13:
## In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/Dense:1:
## In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/Core:88:
## /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/src/Core/util/Macros.h:628:1: error: unknown type name 'namespace'
## namespace Eigen {
## ^
## /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/src/Core/util/Macros.h:628:16: error: expected ';' after top level declarator
## namespace Eigen {
## ^
## ;
## In file included from <built-in>:1:
## In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/StanHeaders/include/stan/math/prim/mat/fun/Eigen.hpp:13:
## In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/Dense:1:
## /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/Core:96:10: fatal error: 'complex' file not found
## #include <complex>
## ^~~~~~~~~
## 3 errors generated.
## make: *** [foo.o] Error 1
##
## SAMPLING FOR MODEL 'eb86daf23abc19774302bed503fea070' NOW (CHAIN 1).
## Chain 1:
## Chain 1: Gradient evaluation took 2e-05 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.2 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1:
## Chain 1:
## Chain 1: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 1: Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 1: Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 1: Iteration: 600 / 2000 [ 30%] (Warmup)
## Chain 1: Iteration: 800 / 2000 [ 40%] (Warmup)
## Chain 1: Iteration: 1000 / 2000 [ 50%] (Warmup)
## Chain 1: Iteration: 1001 / 2000 [ 50%] (Sampling)
## Chain 1: Iteration: 1200 / 2000 [ 60%] (Sampling)
## Chain 1: Iteration: 1400 / 2000 [ 70%] (Sampling)
## Chain 1: Iteration: 1600 / 2000 [ 80%] (Sampling)
## Chain 1: Iteration: 1800 / 2000 [ 90%] (Sampling)
## Chain 1: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 1:
## Chain 1: Elapsed Time: 0.032866 seconds (Warm-up)
## Chain 1: 0.024415 seconds (Sampling)
## Chain 1: 0.057281 seconds (Total)
## Chain 1:
p8.5prior <-
L8.5prior[sample(1:1000, 20), ] %>%
mat2df(cname = "scenario", vname = "predicted_blooms_std") %>%
left_join(CFD) %>%
gf_line(predicted_blooms_std ~ shade_cent, group = ~row,
color = "steelblue", alpha = 0.7) %>%
gf_facet_grid( ~ water_cent, labeller = label_both) %>%
gf_labs(
title = "Model 8.5",
subtitle = "prior"
)
## Joining, by = "scenario"
p8.4prior <-
L8.4prior[ sample(1:1000, 20), ] %>%
mat2df(cname = "scenario", vname = "predicted_blooms_std") %>%
left_join(CFD) %>%
gf_line(predicted_blooms_std ~ shade_cent, group = ~row,
color = "steelblue", alpha = 0.7) %>%
gf_facet_grid( ~ water_cent, labeller = label_both) %>%
gf_labs(
title = "Model 8.4",
subtitle = "prior"
)
## Joining, by = "scenario"
p8.4prior / p8.5prior
What can we say about these priors, overall? They are harmless, but only weakly realistic. Most of the lines stay within the valid outcome space. But silly trends are not rare. We could do better. We could also do a lot worse, such as flat priors which would consider plausible that even a tiny increase in shade would kill all the tulips. If you displayed these priors to your colleagues, a reasonable summary might be, “These priors contain no bias towards positive or negative effects, and at the same time they very weakly bound the effects to realistic ranges.” (p. 260)
I need to do some mroe testing to make sure this does what I think it does, but it seems like this should work:
P <- quap(data = Tulips, u8.5@formula) %>% extract.prior()
L8.5prior2 <-
link(u8.5, post = P, data = CFD)
p8.4prior /
L8.5prior2[sample(1:1000, 20), ] %>%
mat2df(cname = "scenario", vname = "predicted_blooms_std") %>%
left_join(CFD) %>%
gf_line(predicted_blooms_std ~ shade_cent, group = ~row,
color = "steelblue", alpha = 0.7) %>%
gf_facet_grid( ~ water_cent, labeller = label_both) %>%
gf_labs(
title = "Model 8.5",
subtitle = "prior via quap()"
)
## Joining, by = "scenario"
Ever wonder what Stan code looks like? Here you go…
u8.5 %>%
stanfit() %>%
get_stancode() %>%
cat()
data{
vector[27] blooms;
int shade[27];
int water[27];
int bed[27];
vector[27] blooms_std;
int shade_cent[27];
int water_cent[27];
}
parameters{
real b0;
real bws;
real bs;
real bw;
real<lower=0> sigma;
}
model{
vector[27] mu;
sigma ~ exponential( 1 );
bw ~ normal( 0 , 0.25 );
bs ~ normal( 0 , 0.25 );
bws ~ normal( 0 , 0.25 );
b0 ~ normal( 0 , 0.25 );
for ( i in 1:27 ) {
mu[i] = b0 + bw * water_cent[i] + bs * shade_cent[i] + bws * water_cent[i] * shade_cent[i];
}
blooms_std ~ normal( mu , sigma );
}