Multiple Regression

Goals for today

Bringing our previous examples into a systematic framework.

This material is based on Rethinking 6.4.

Confronting confounding

What is confounding?

Let’s define confounding as any context in which the association between an outcome \(Y\) and a predictor of interest \(X\) is not the same as it would be, if we had experimentally determined the values of \(X\)" (p. 183, emphasis in the original).

Recall that in an experiment, the researchers determine (usually using a random process) the values of some of the variables in the study.

Confounding, Experiments, and DAGS

Perhaps you are interested in the effect of education (E) on wages (W).

But it is likely that there are other things (U) that affect both wages and education:

If we do a simple regression of \(W\) on \(E\), the causal effect will be confounded by \(U\). The effect of \(U\) on both \(E\) and \(W\) creates additional correlation betwen \(E\) and \(W\) in addition to the direct effect of \(E\) on \(W\).

(Undirected) Paths in a DAG

We can understand this in terms of the (undirected) paths in our graph. We have two paths from \(E\) to \(W\) in our DAG:

\(E \to W\)
\(E \leftarrow U \to W\) (Note: we don’t have to follow the direction of the arrows.)

To isolate the path of interest (\(E \to W\)), we need to block the other path.

One way to do this is by running an experiment. If we could randomly assign education to the subjects, then we would have this DAG:

Manipulation removes the confounding, because it blocks the other path between \(E\) and \(W\)" (p. 184).

But it isn’t always possible to block paths via experiments. What are the other options? We could condition on \(U\) by including \(U\) in our model. Why does this work? Consider another DAG containing just the troublesome path:

\(E\) and \(W\) are indepedent, conditional on \(U\).

dagify(W ~ U, E ~ U) %>% 
  impliedConditionalIndependencies()

## E _||_ W | U

If we don’t know \(U\), then \(E\) provides information about \(W\) because both \(E\) and \(W\) are causally related to \(U\). But once we know \(U\), learning about \(E\) provides no additional information about \(W\). So conditioning on \(U\) blocks this path.

Shutting the backdoor.

Blocking confounding paths between some predictor \(X\) and some outcome \(Y\) is known as shutting the backdoor. We don’t want any spurious association sneaking in through a non-causal path that enters the back of the predictor \(X\). In the example above, the path \(E \leftarrow U \rightarrow W\) is a backdoor path, because it enters \(E\) with an arrow and also connects \(E\) to \(W\). This path is non-causal–intervening on \(E\) will not cause a change in \(W\) through this path–but it still produces an association between \(E\) and \(W\).

Now for some good news. Given a causal DAG, it is always possible to say which, if any, variables one must control for in order to shut all the backdoor paths. It is also possible to say which variables one must not control for, in order to avoid making new confounds. And–some more good news–there are only four types of variable relations that combine to form all possible paths. (p. 184, emphasis in the original)

The Four Elemental Confounds

Here are the representations for our four types of variable relations: the fork, pipe, collider, and descendant.

Since we will be thinking about these along a path, we might prefer to look at them like this.

Each node on a path is either a fork, a pipe, or a collider. (Note: this status depends on the path; the same node may play different roles on different paths.)

Opening and closing paths

Our goal is to have all backdoor paths closed.

Fork
- Example: \(\mbox{Growth} \leftarrow\mbox{Moisture} \to \mbox{Fungus}\)
- This is the “common cause” confound.
- \(X \perp \!\! \perp Y \mid Z\)
- Conditioning on \(Z\) blocks the path (of information) between \(X\) and \(Y\).
Pipe
- Example: \(\mbox{Treatment} \to \mbox{Fungus} \to \mbox{Growth}\)
- This is the “mediated effect” confound.
- \(X \perp \!\! \perp Y \mid Z\)
- Conditioning on \(Z\) blocks the path (of information) between \(X\) and \(Y\).
Collider
- Example: \(\mbox{Trustworthy} \to \mbox{Selection} \leftarrow\mbox{Newsworthy}\)
- This is the “common effect” confound.
- \(X \perp \!\!\perp\!\!\!\!\!/\;\,Y \mid Z\)
- Conditioning on \(Z\) opens the path (of information) between \(X\) and \(Y\).
Descendant
- Conditioning on a descendant is like a weak version of conditioning on its parent.
- \(D\) can be used as a proxy for \(Z\).

The recipe (page 185)

The recipe given in Statistical Rethinking is a little bit imprecise. Here’s a modified version:

List all paths connecting \(X\) (the potential cause of interest – the eXposure) and \(Y\) (the outcome).
Classify each path as causal or backdoor (non-causal)
- A backdoor (or non-causal) path = at least one arrow followed “backwards”
- Causal path = a path that follows all the arrows “forwards”
Classify each backdoor path by whether it is open or closed.
- open = no collider on path
- closed = collider on path
Close any open backdoor paths (if possible) by conditioning on one or more variables without closing any causal paths.
- Rule 1: Conditioning on any non-collider blocks/closes a path. [green]
- Rule 2: Not conditioning on any collider blocks/closes a path. [red]
- Rule 3: Conditioning on all colliders and on no non-colliders opens a path.
- Rule 4: Conditioning on a descendant of a collider (partially) conditions on the collider. [orange]
  
  So Rules 2 and 3 need a little updating to be completely correct. We need to avoid conditioning on colliders and all of their descendants to close a path.

Example: Two roads

“The DAG below contains an exposure of interest \(X\), an outcome of interest \(Y\), an unobserved variable \(U\), and three observed covariates (\(A\), \(B\), and \(C\))” (p. 186).

In this DAG, there are two backdoor paths from \(X\) to \(Y\)

\(X \leftarrow U \leftarrow A \rightarrow C \rightarrow Y\), which is open; and
\(X \leftarrow U \rightarrow B \leftarrow C \rightarrow Y\), which is closed.

Conditioning on either \(C\) or \(A\) will close the open backdoor.

dag_6.1 <- 
  dagitty("dag { U [unobserved]
  X -> Y;  X <- U <- A -> C -> Y;  U -> B <- C }" )

adjustmentSets(dag_6.1, exposure = "X", outcome = "Y")

## { C }
## { A }

Actually, we have some other options as well. The options above are the minimal options that avoid using \(U\).

adjustmentSets(dag_6.1, exposure = "X", outcome = "Y", type = "all")

## { A }
## { C }
## { A, C }
## { B, C }
## { A, B, C }
## { U }
## { A, U }
## { B, U }
## { A, B, U }
## { C, U }
## { A, C, U }
## { B, C, U }
## { A, B, C, U }

Backdoor waffles

This example is based on some discussion in the text and on 6H1 and 6H2. (Spoiler alert: It’s a little bit disappointing in the end, but we’ll go through it anyway.)

A DAG for Waffles

dag_coords <-
  tibble(name = c("A", "D", "M", "S", "W"),
         x    = c(1, 3, 2, 1, 3),
         y    = c(1, 1, 2, 3, 3))

waffle_dag <-
  dagify(A ~ S,
       D ~ A + M + W,
       M ~ A + S,
       W ~ S,
       coords = dag_coords) 

waffle_dag %>%
  gg_dag()

\(S\) is whether or not a State is in the southern United States
\(A\) is median age at marriage
\(M\) is marriage rate
\(W\) is number of Waffle Houses,
\(D\) is divorce rate.

This graph assumes that southern States have different ages of marriage (\(S \rightarrow A\)); different rates of marriage, both directly (\(S \rightarrow M\)) and mediated through age of marriage (\(S \rightarrow A \rightarrow M\)); and different Waffle House densities (\(S \rightarrow W\)) compared to non-Southern States. In addition, age of marriage and marriage rate both influence divorce rates.

Closing the backdoor paths

There are several backdoor paths between \(W\) and \(D\). How do we (simultaneously) close them?

library(dagitty)
waffle_dag<- 
  dagitty("dag {
  A -> D
  A -> M -> D
  A <- S -> M
  S -> W -> D
           }"
  )

paths(waffle_dag, from = "W", to = "D")

## $paths
## [1] "W -> D"                "W <- S -> A -> D"      "W <- S -> A -> M -> D"
## [4] "W <- S -> M -> D"      "W <- S -> M <- A -> D"
## 
## $open
## [1]  TRUE  TRUE  TRUE  TRUE FALSE

adjustmentSets(waffle_dag, exposure = "W", outcome = "D")

## { A, M }
## { S }

adjustmentSets(waffle_dag, exposure = "W", outcome = "D", type = "all")

## { A, M }
## { S }
## { A, S }
## { M, S }
## { A, M, S }

The first line of output indicates we’d have to condition on \(A\) and \(M\) simultaneously. As an alternative, we could just condition on \(S\). Those are the two “minimal” options. In this case, we have options to condition on some other things if we like. Any of those adjustments sets are “DAG approved” for the purpose of estimating the total causal effect of Waffle Houses on divorce rate. Any other options are open to confounding.

Here are the conditional independencies implied in that DAG.

impliedConditionalIndependencies(waffle_dag)

## A _||_ W | S
## D _||_ S | A, M, W
## M _||_ W | S

Fitting some models

Multiple regression is no oracle, but only a golem. It is logical, but the relationships it describes are conditional associations, not causal influences. Therefore additional information, from outside the model, is needed to make sense of it. This chapter presented introductory examples of some common frustrations: multicollinearity, post-treatment bias, and collider bias. Solutions to these frustrations can be organized under a coherent framework in which hypothetical causal relations among variables are analyzed to cope with confounding. (p. 189)

In the last section, we used a DAG to explore how including/excluding different covariates might influence our estimate of the causal relationship between the number of Waffle Houses and the divorce rate (\(W \rightarrow D\)). To finish that example out, we might explore some of the possible models informed by the DAG.

data(WaffleDivorce, package = "rethinking")
Waffles <- 
  WaffleDivorce %>%
  mutate(a = rethinking::standardize(MedianAgeMarriage),
         d = rethinking::standardize(Divorce),
         m = rethinking::standardize(Marriage),
         s = factor(South, levels = 0:1, labels = c("North", "South")),
         w = rethinking::standardize(WaffleHouses))

# tidy up
rm(WaffleDivorce)

The only focal variable we did not standardize was South, which is binary. Technically, you can standardize binary variables and not break the regression model. But things are easier to interpret if we leave them on their 0/1 scale. Here is our ggpairs() plot of all five focal variables. Remember, the central issue is the causal relation between w and d.

GGally::ggpairs(Waffles %>% select(a:w),
                mapping = aes(color = s, alpha = 0.6)) %>%
  gf_refine(
    scale_fill_manual(values = c("forestgreen", "lightblue"))
  )

Let’s fit a series of models. The priors are based on those we used the last time we saw the WaffleHouses data (in Chapter 5).

u6.13 <- 
  ulam(
    data = Waffles %>% select(d, w),
    alist(
      d ~ dnorm(mu, sigma),
      mu <- b0 + bw * w,
      b0 ~ dnorm(0, 0.2),
      bw ~ dnorm(0, 0.5),
      sigma ~ dexp(1)
    ),
    iter = 2000, warmup = 1000, chains = 4, cores = 4,
    seed = 6,
    file = "fits/u6.13")

u6.14 <- 
  ulam(
    data = Waffles %>% select(d, w, s),
    alist(
      d ~ dnorm(mu, sigma),
      mu <- b0 + bw * w + bs * s,
      b0 ~ dnorm(0, 0.2),
      c(bw, bs) ~ dnorm(0, 0.5),
      sigma ~ dexp(1)
    ),
    iter = 2000, warmup = 1000, chains = 4, cores = 4,
    seed = 6,
    file = "fits/u6.14")

u6.15 <- 
  ulam(
    data = Waffles %>% select(d, w, s, m),
    alist(
      d ~ dnorm(mu, sigma),
      mu <- b0 + bw * w + bs * s + bm * m,
      b0 ~ dnorm(0, 0.2),
      c(bw, bs, bm) ~ dnorm(0, 0.5),
      sigma ~ dexp(1)
    ),
    iter = 2000, warmup = 1000, chains = 4, cores = 4,
    seed = 6,
    file = "fits/u6.15")

u6.16 <- 
  ulam(
    data = Waffles %>% select(d, w, a),
    alist(
      d ~ dnorm(mu, sigma),
      mu <- b0 + bw * w + ba * a,
      b0 ~ dnorm(0, 0.2),
      c(bw, ba) ~ dnorm(0, 0.5),
      sigma ~ dexp(1)
    ),
    iter = 2000, warmup = 1000, chains = 4, cores = 4,
    seed = 6,
    file = "fits/u6.16")

u6.17 <- 
  ulam(
    data = Waffles %>% select(d, w, m),
    alist(
      d ~ dnorm(mu, sigma),
      mu <- b0 + bw * w + bm * m,
      b0 ~ dnorm(0, 0.2),
      c(bw, bm) ~ dnorm(0, 0.5),
      sigma ~ dexp(1)
    ),
    iter = 2000, warmup = 1000, chains = 4, cores = 4,
    seed = 6,
    file = "fits/u6.17")

u6.18 <- 
  ulam(
    data = Waffles %>% select(d, w, a, m),
    alist(
      d ~ dnorm(mu, sigma),
      mu <- b0 + bw * w + ba * a + bm * m,
      b0 ~ dnorm(0, 0.2),
      c(bw, bm, ba) ~ dnorm(0, 0.5),
      sigma ~ dexp(1)
    ),
    iter = 2000, warmup = 1000, chains = 4, cores = 4,
    seed = 6,
    file = "fits/u6.18")

If we extract the posterior draws from each model, the column corresponding to \(W \rightarrow D\) will be bw. Here we extract those columns, wrangle, and compare the posteriors in a coefficient plot.

library(stringr)
gf_pointinterval <-
  layer_factory(
    geom = tidybayes::GeomPointinterval, stat = ggdist::StatPointinterval,
    aes_form = y ~ x)

formula <- c("d ~ 1 + w", 
             "d ~ 1 + w + s", 
             "d ~ 1 + w + s + m", 
             "d ~ 1 + w + a", 
             "d ~ 1 + w + m",
             "d ~ 1 + w + a + m")

tibble(fit = str_c("u6.1", 3:8)) %>% 
  mutate(
    y    = str_c(fit, " (", formula, ")"),
    post = purrr::map(fit, ~get(.) %>% 
                        stanfit() %>%
                        as.data.frame() %>% 
                        select(bw))) %>% 
  unnest(post) %>% 
  gf_density_ridges(y ~ bw, scale = 0.6, alpha = 0.4,
                   fill = ~ fit %in% c("u6.14", "u6.15", "u6.18"),
                   show.legend = FALSE) %>%
  gf_pointinterval(y ~ bw, 
                   color = ~ fit %in% c("u6.14", "u6.15", "u6.18"),
                   show.legend = FALSE)

## Picking joint bandwidth of 0.0217

The \(\beta_w\) posteriors corresponding to models endorsed by our DAG are in green.

Frustratingly, the results from a real-world data analysis aren’t as conclusive as we may have naïvely supposed. It seems like there should not be a causal relationship between the number of Waffle Houses and divorce rates. But all of our models seem to think it is at least plausible that there is. And there seems to be little difference, in this case, between our “DAG approved” models and the others.

Why? Here are two possible reasons:

The DAG is wrong.

That’s probably true in this case. It’s probably true in most cases – you can always dream up other potential confounds. But we don’t need our DAG to be correct, we just need it to be useful enough.

This DAG is obviously not satisfactory–it assumes there are no unobserved confounds, which is very unlikely for this sort of data. But we can still learn something by analyzing it. While the data cannot tell us whether a graph is correct, it can sometimes suggest how a graph is wrong. (p. 187)
The model is wrong.

The DAG summarises our assumptions about causal relationships, but it does not say what the nature of the causal relationship is. Perhaps the association is not as simple as our linear model presumes.
Our intuition is wrong.

Perhaps Waffle Houses do really impact the rate of divorce in a state.

DAGs are nice tool for informing our data analytic strategy. But just as our multivariable predictor models are no fail-safes, our DAGs aren’t either. Models, theories, data-collection procedures–they all work together (or not) to determine the quality of our posteriors. To the extent any of those components are off, watch out.

More Practice

In each example below a possible causal influence in indicated. Determine which variables in the DAG should be included in your model to estimate this causal influence. Do this by following our recipe:

List all backdoor paths between the indicated variables;
For each backdoor path, determine whether it is open or closed;
Choose variables to condition on that close all backdoor paths without closing any causal paths.

Any nodes in orange are unobserved. If possible, avoid conditioning on unobserved variables.

Note: You can check your work by creating the DAG with dagitty() or dagify() and using adjustmentSets().

\(X \to Y\)
\(Z \to Y\)

impliedConditionalIndependencies(dag2)

## A _||_ X | Z

adjustmentSets(dag2, exposure = "X", outcome = "Y")

## { Z }

adjustmentSets(dag2, exposure = "Z", outcome = "Y")

## { A }

2. \(R \to G\)

impliedConditionalIndependencies(dag1)

## C _||_ G | H, R, U
## C _||_ R
## C _||_ U

adjustmentSets(dag1, exposure = "R", outcome = "G")

## { U }

adjustmentSets(dag1, exposure = "H", outcome = "G")

## { R, U }

3. \(X \to Y\)

impliedConditionalIndependencies(dag2)

## A _||_ X | Z

adjustmentSets(dag3, exposure = "X", outcome = "Y")

## { B, Z }

adjustmentSets(dag3, exposure = "Z", outcome = "Y")

## { A }

adjustmentSets(dag3, exposure = "B", outcome = "Y")

## { Z }

4. \(X \to Y\)

impliedConditionalIndependencies(dag4)

## U _||_ X | Z
## Y _||_ Z | U, X

adjustmentSets(dag4, exposure = "X", outcome = "Y")

## { U }
## { Z }

5. \(X \to Y\)

impliedConditionalIndependencies(dag5)

## A _||_ B
## A _||_ Y | X
## B _||_ X
## X _||_ Z | A
## Y _||_ Z | A, B
## Y _||_ Z | B, X

adjustmentSets(dag5, exposure = "X", outcome = "Y")

##  {}

6. \(X \to Y\)

impliedConditionalIndependencies(dag6)

## A _||_ M | X, Z
## A _||_ Y | M
## A _||_ Y | X, Z
## X _||_ Y | M
## X _||_ Z | A
## Y _||_ Z | M

adjustmentSets(dag6, exposure = "X", outcome = "Y", "all")

## { A }
## { Z }
## { A, Z }

7. \(X \to Y\)