\[ \operatorname{Pr}(A \mid B) = \frac{\operatorname{Pr}(\mbox{both})}{\operatorname{Pr}(\mbox{condition})} = \frac{\operatorname{Pr}(\mbox{joint})}{\operatorname{Pr}(\mbox{marginal})} = \frac{\operatorname{Pr}(A, B)}{\operatorname{Pr}(B)} \]
Rearranging:
\[\begin{align*} \operatorname{Pr}(A, B) &= \operatorname{Pr}(B) \cdot \operatorname{Pr}(A \mid B) \\ &= \operatorname{Pr}(A) \cdot \operatorname{Pr}(B \mid A) \end{align*}\]More rearranging
\[\begin{align*} \operatorname{Pr}(B) \operatorname{Pr}(A \mid B) &= \operatorname{Pr}(A) \cdot \operatorname{Pr}(B \mid A) \\[5mm] \operatorname{Pr}(A \mid B) &= \frac{\operatorname{Pr}(A) \cdot \operatorname{Pr}(B \mid A)}{\operatorname{Pr}(B)} \end{align*}\]And some different letters (and words):
\[\begin{align*} \operatorname{Pr}(H \mid D) &= \frac{\operatorname{Pr}(H) \cdot \operatorname{Pr}(D \mid H)}{\operatorname{Pr}(D)} \\[5mm] \operatorname{Pr}(\mbox{hypothesis} \mid \mbox{data}) &= \frac{\operatorname{Pr}(\mbox{hypothesis}) \cdot \operatorname{Pr}(\mbox{data} \mid \mbox{hypothesis})}{\operatorname{Pr}(\mbox{data})} \end{align*}\]The big take-away:
\[ \mbox{posterior} \propto \mbox{prior} \cdot \mbox{liklihood} \]
Data: 6 of 9 sampled points were water
Prior: \(p \sim \operatorname{Unif}(0,1)\)
Likelihood: How likely are the data given a parameter value?
WaterGrid <-
expand_grid(
p = seq(0, 1, length.out = 501)
) %>%
mutate(
prior = 1,
likelihood = p^6 * (1-p)^3,
likelihood = likelihood / sum(likelihood),
posterior0 = prior * likelihood,
posterior = posterior0 / sum(posterior0)
)
library(patchwork)
gf_line( prior ~ p, data = WaterGrid) |
gf_line( likelihood ~ p, data = WaterGrid) |
gf_line( posterior ~ p, data = WaterGrid)
Those plots are on very different scales. Mostly this doesn’t matter. But technically, the posterior plot is wrong since we haven’t scaled it properly. It should be scaled so that the area under the curve is 1 since it is a density. The likelihood is not a density, but we could rescale it just to make it compatible with the other two.
To rescale to get area 1, we need to take into account the width of our grid. If we let \(h_p\) be the height of one of these curves at position \(p\), and approximate the area with a rectangle, we want
\[
\sum h_p \Delta p = 1
\] so \[
\sum h = 1 / \Delta p
\] We can get this by adding an additional / 0.002
to the rescaling that made the sum equal 1.
WaterGrid <-
expand_grid(
p = seq(0, 1, length.out = 501)
) %>%
mutate(
prior0 = 1,
prior = prior0 / sum(prior0) / 0.002,
likelihood0 = p^6 * (1-p)^3,
likelihood = likelihood0 / sum(likelihood0) / 0.002,
posterior0 = prior * likelihood,
posterior = posterior0 / sum(posterior0) / 0.002
)
Now we can plot these functions on the same scale:
gf_line( prior ~ p, data = WaterGrid, color = ~ "prior") %>%
gf_line( likelihood ~ p, data = WaterGrid, color = ~ "likelihood", size = 1.7) %>%
gf_line( posterior ~ p, data = WaterGrid, color = ~ "posterior")
Saving the intermediate values is just done for inspection purposes. In the next example we rescale in place.
New Prior: \(p \sim \operatorname{Triangle}(0, 1, 0.7)\)
library(triangle)
WaterGrid2 <-
expand_grid(
p = seq(0, 1, length.out = 501)
) %>%
mutate(
prior = dtriangle(p, 0, 1, 0.7),
likelihood = p^6 * (1-p)^3,
likelihood = likelihood / sum(likelihood) / 0.002,
posterior = prior * likelihood,
posterior = posterior / sum(posterior) / 0.002
)
gf_line( prior ~ p, data = WaterGrid2, color = ~ "prior") %>%
gf_line( likelihood ~ p, data = WaterGrid2, color = ~ "likelihood", size = 1.7) %>%
gf_line( posterior ~ p, data = WaterGrid2, color = ~ "posterior")
Let’s compare the two posteriors.
gf_line( posterior ~ p, data = WaterGrid, color = ~ "uniform prior") %>%
gf_line( posterior ~ p, data = WaterGrid2, color = ~ "triangle prior")