Design and fit a simple linear model.
Use it to create this plot.
Start thinking about why multiple regression will be important.
Interestingly, this model is quite confident that there is a positive association between the number of waffle houses and divorce rates. Does building a waffle house cause people to get divorced? That doesn’t sound right. So what explains this association? We’ll come back to this in a bit, but first, let’s see how the model depicted above was designed and fit.
The details of this model are not presented in the text, so we get to design it ourselves. Here is the outline of our model
\[\begin{align*} D &\sim {\sf Norm}(\mu, \sigma) \\ \mu &= \beta_0 + \beta_1 W \\ \beta_0 &\sim ?? \\ \beta_1 &\sim ?? \\ \sigma &\sim ?? \\ \end{align*}\]
where \(D\) and \(W\) are the standardized divorce rate and number of waffle houses per million people.
Now we need some priors.
\[\beta_0 \sim {\sf Norm}(0, 0.2)\]
\[\beta_1 \sim {\sf Norm}(0, 0.3)\]
\[\sigma \sim {\sf Exp}(2)\] Notice how standardization helped us design each of these priors.
\[\begin{align*} D &\sim {\sf Norm}(\mu, \sigma) \\ \mu &= \beta_0 + \beta_1 W \\ \beta_0 &\sim {\sf Norm}(0, 0.2) \\ \beta_1 &\sim {\sf Norm}(0, 0.3) \\ \sigma &\sim {\sf Exp}(2) \\ \end{align*}\]
We can do some prior predictive sampling to make sure that these seem reasonable (and make any adjustments that might be necessary).
PriorSamples <-
tibble(
beta_0 = rnorm(200, 0, 0.2),
beta_1 = rnorm(200, 0, 0.3),
sigma = rexp(200, 2)
)
gf_abline(slope = ~ beta_1, intercept = ~beta_0,
data = PriorSamples, alpha = 0.2) %>%
gf_lims(x = c(-3, 3), y = c(-3, 3)) %>%
gf_abline(slope = ~ c(-1, 1), intercept = ~c(0,0),
data = NA,
inherit = FALSE,
color = "red", linetype = "dotted")
The steepest lines shown exhibit nearly perfect correlation between divorce rates and waffle houses, but most of them have modest positive or negative slopes. So our prior appears to be giving a reasonable range of plausible lines.
As it turns out, we can also extract prior samples from a quap model, just as easily as posterior samples.
data(WaffleDivorce)
Waffles <-
WaffleDivorce %>%
mutate(
WaffleHousesPerCap = WaffleHouses / Population,
D = standardize(Divorce),
W = standardize(WaffleHousesPerCap),
A = standardize(MedianAgeMarriage),
M = standardize(Marriage)
)
m5.0 <- quap(
data = Waffles,
alist(
D ~ dnorm(mu, sigma),
mu <- beta_0 + beta_1 * W,
beta_0 ~ dnorm(0, 0.2),
beta_1 ~ dnorm(0, 0.3),
sigma ~ dexp(2)
)
)
PriorSamples2 <- m5.0 %>% extract.prior(n = 200) %>% as.data.frame()
gf_abline(slope = ~ beta_1, intercept = ~beta_0,
data = PriorSamples2, alpha = 0.2) %>%
gf_lims(x = c(-3, 3), y = c(-3, 3)) %>%
gf_abline(slope = ~ c(-1, 1), intercept = ~c(0,0),
data = NA,
inherit = FALSE,
color = "red", linetype = "dotted")
It would be easier to think about those lines on the natural scale. We can convert the intercept just like any other value of the response. We make this simpler using unstandardize()
. The slope \(\beta_1\) needs to be multiplied by the ratio of the standard deviations.
gf_abline(
slope = ~ beta_1 * sd(Waffles$Divorce) / sd(Waffles$WaffleHousesPerCap),
intercept = ~ mean(Waffles$Divorce) + beta_0 * sd(Waffles$Divorce),
data = PriorSamples, alpha = 0.2) %>%
gf_lims(y = c(0, 20), x = c(0, 50))
Looks like all of the divorce rates stay in a plausible range (none are negative, for example) over the range of waffle house densities seen in our data. So our priors seem reasonable enough.
rethinking::standardize
## function (x)
## {
## x <- scale(x)
## z <- as.numeric(x)
## attr(z, "scaled:center") <- attr(x, "scaled:center")
## attr(z, "scaled:scale") <- attr(x, "scaled:scale")
## return(z)
## }
## <bytecode: 0x7fd4b6697a20>
## <environment: namespace:rethinking>
rethinking::unstandardize
## function (x)
## {
## scale <- attr(x, "scaled:scale")
## center <- attr(x, "scaled:center")
## z <- x * scale + center
## return(as.numeric(z))
## }
## <bytecode: 0x7fd4b659fe80>
## <environment: namespace:rethinking>
That’s slick: the mean and standard deviation get stored (as “scaled:center” and “scaled:scale”), so we can access them later when unstandardizing.
But what if we want to standardize or unstandardize new data (like a sequence of points along the range of our data)? With a little bit of adjustment, we can make that possible.
standardize <-
function (
x, ref = x, center = mean(ref), scale = sd(ref)
)
{
x <- scale(x, center = center, scale = scale)
z <- as.numeric(x)
attr(z, "scaled:center") <- attr(x, "scaled:center")
attr(z, "scaled:scale") <- attr(x, "scaled:scale")
z
}
unstandardize <-
function (
x, ref = x,
scale = attr(ref, "scaled:scale"),
center = attr(ref, "scaled:center")
)
{
res <- x * scale + center
as.numeric(res)
}
That only helps for our intercept here, but will be useful for other things shortly.
gf_abline(
slope = ~ beta_1 * sd(Waffles$Divorce) / sd(Waffles$WaffleHousesPerCap),
intercept = ~ unstandardize(beta_0, Waffles$D),
data = PriorSamples, alpha = 0.2) %>%
gf_lims(y = c(0, 20), x = c(0, 50))
WafflePost <- m5.0 %>% extract.samples(n = 1e4)
gf_dens( ~ beta_1, data = WafflePost)
prop( ~ (beta_1 < 0), data = WafflePost)
## prop_TRUE
## 0.0041
tally( ~ (beta_1 < 0), data = WafflePost, format = "prop")
## (beta_1 < 0)
## TRUE FALSE
## 0.0041 0.9959
precis(m5.0)
## mean sd 5.5% 94.5%
## beta_0 -6.136869e-07 0.10794351 -0.1725152 0.1725140
## beta_1 3.086196e-01 0.11929114 0.1179693 0.4992698
## sigma 9.066690e-01 0.08857643 0.7651068 1.0482313
Link5.0 <- link(m5.0)
Avg5.0 <- apply(Link5.0, 2, mean_hdi, .width = 0.9) %>%
bind_rows() %>%
bind_cols(Waffles)
gf_point(Divorce ~ WaffleHousesPerCap, data = Waffles) %>%
gf_ribbon(unstandardize(ymin, D) + unstandardize(ymax, D) ~ WaffleHousesPerCap,
data = Avg5.0, inherit = FALSE, fill = "steelblue") %>%
gf_labs(
caption = "blue band is 90% credible interval for average response",
y = "Divorces per 1000 Marriages",
x = "Waffle Houses per Million People")
Simply put, multiple regression is regression with multiple explanatory variables. These go by other names as well, like predictors, regressors, etc. One name I do not like is “independent variables”. Please do not use the term. Why? Because independence is an important statistical term that means something completely different, and typically predictors are not independent (in the statistical sense) of each other or the response variable.
Our goal over the next several sessions will be to learn how to design, fit, and interpret multiple regression models. Much of this work would be the same even in a non-Bayesian class. So if you have seen multiple regression before, you will be adding to what you already know. And if you see multiple regression later, even in a non-Bayesian context, you will be able to apply things you are learning here. Of course, we will use the Bayesian framework for the regression models in this course.
It is the design and interpretation that will be the challenging part. These models are not technically much more challenging to fit than the models we have been using, and eventually we will learn some tools that make describing the model even easier (but come with a different cost – learning about how MCMC algorithms work and might fail).
Quote of the day (from Chapter 5 of McElreath):
Multiple regression can be worse than useless.
That sounds a bit scary, but all it is saying is what I just said: the hard part is not getting the computer to fit these models, it is designing and interpreting the models. And used without care, multiple regression models can lead us in the wrong direction.
There are at least two goals we could have for our model:
Make predictions.
Understand causal relationships.
A good model for one of these goals might not be the best model for the other goal. This is one of the little paradoxes of regression.