Conditional Probability and Bayesian analysis

\[ \operatorname{Pr}(A \mid B) = \frac{\operatorname{Pr}(\mbox{both})}{\operatorname{Pr}(\mbox{condition})} = \frac{\operatorname{Pr}(\mbox{joint})}{\operatorname{Pr}(\mbox{marginal})} = \frac{\operatorname{Pr}(A, B)}{\operatorname{Pr}(B)} \]

Rearranging:

\[\begin{align*} \operatorname{Pr}(A, B) &= \operatorname{Pr}(B) \cdot \operatorname{Pr}(A \mid B) \\ &= \operatorname{Pr}(A) \cdot \operatorname{Pr}(B \mid A) \end{align*}\]

More rearranging

\[\begin{align*} \operatorname{Pr}(B) \operatorname{Pr}(A \mid B) &= \operatorname{Pr}(A) \cdot \operatorname{Pr}(B \mid A) \\[5mm] \operatorname{Pr}(A \mid B) &= \frac{\operatorname{Pr}(A) \cdot \operatorname{Pr}(B \mid A)}{\operatorname{Pr}(B)} \end{align*}\]

And some different letters (and words):

\[\begin{align*} \operatorname{Pr}(H \mid D) &= \frac{\operatorname{Pr}(H) \cdot \operatorname{Pr}(D \mid H)}{\operatorname{Pr}(D)} \\[5mm] \operatorname{Pr}(\mbox{hypothesis} \mid \mbox{data}) &= \frac{\operatorname{Pr}(\mbox{hypothesis}) \cdot \operatorname{Pr}(\mbox{data} \mid \mbox{hypothesis})}{\operatorname{Pr}(\mbox{data})} \end{align*}\]

Back to those coins…

Suppose we have a coin with probability of heads = \(\theta\), which is either 0, 0.2, 0.4, 0.6, 0.8, or 1. We flip the coin 4 times and get HHTT.

We would like to know

\[ \operatorname{Pr}(\mbox{hypothesis} \mid \mbox{data}) = \operatorname{Pr}( \theta = 0.2 \mid HHTT) \] Using the formula above we get

\[\begin{align*} \operatorname{Pr}(\theta = 0.2 \mid HHTT) &= \frac{\operatorname{Pr}(\theta = 0.2) \cdot \operatorname{Pr}(HHTT \mid \theta = 0.2)}{\operatorname{Pr}(HHTT)} \end{align*}\]

This is (a version of) the key equation that drives all of Bayesian analysis. Of the things on the right side, which parts are easy, which are more challenging?

\(\operatorname{Pr}(\theta = 0.2) = \mbox{prior} = 1/6\)
- This is what we know/believe about \(\theta\) before we collect the data.¹

\(\operatorname{Pr}(HHTT \mid \theta = 0.2) = \mbox{likelihood} = 0.2 \cdot 0.2 \cdot 0.8 \cdot 0.8\)
- This is generally “easy” given the model – the model must specify how data would be generated if the model is correct.
- This measures how likely the data would be if \(\theta = 0.2\).
- This is really \(\operatorname{Pr}(H) \cdot \operatorname{Pr}(HH \mid H) \cdot \operatorname{Pr}(HHT \mid HH) \cdot \operatorname{Pr}(HHTT \mid HHT)\), but since the coin tosses are independent, the conditional probabilities are the same as the marginal probabilities.

The only tricky bit is the denominator, and we’ll come back to that in just a moment.

But first let’s compute the numerators for all 6 possible values of \(\theta\) – using R:

Using R to do mutliple things at once

Let’s put all of our arithmetic into a nice table:

tibble(                                  # create a data frame
  p = 0:5 / 5,                           # our six possible probabilities
  prior = 1/6,                           # this will get copied to every row
  likelihood = p * p * (1-p) * (1-p),    # we can refer back to previous columns
  numerator = prior * likelihood
) %>% pander()                           # make the output fancy

p	prior	likelihood	numerator
0	0.1667	0	0
0.2	0.1667	0.0256	0.004267
0.4	0.1667	0.0576	0.0096
0.6	0.1667	0.0576	0.0096
0.8	0.1667	0.0256	0.004267
1	0.1667	0	0

What about that denominator?

Notice two things:

It is the same for each row of our table.
Since the posterior should be a distribution, the values of the posterior must sum to 1.

So we can simply rescale proportionally by dividing by the sum of the numerators.

tibble(
  p = 0:5 / 5,
  prior = 1/6,
  likelihood = p * p * (1-p) * (1-p),
  numerator = prior * likelihood,
  posterior = numerator / sum(numerator)
) %>% pander()

p	prior	likelihood	numerator	posterior
0	0.1667	0	0	0
0.2	0.1667	0.0256	0.004267	0.1538
0.4	0.1667	0.0576	0.0096	0.3462
0.6	0.1667	0.0576	0.0096	0.3462
0.8	0.1667	0.0256	0.004267	0.1538
1	0.1667	0	0	0

Still to come

In a real application, we won’t have a finite list of possibilities but will want to consider any possible value for \(\theta\). Come back next time to find out how we deal with that.

This statement is a little bit too casual, but it will work for now to develop our intuition about priors and Bayesian inference. It would be better to say that the prior is just a part of the model, and that there are multiple considerations that go into the choice of prior.↩︎

Conditional Probability and Bayesian analysis

Stat 341

Randall Pruim

Back to those coins…

Using R to do mutliple things at once

What about that denominator?

Still to come