5 Conjugate Priors

1. Show that if $\alpha, \beta > 1$, then the mode of a ${\sf Beta}(\alpha, \beta)$ distribution is $\displaystyle {\frac{\alpha -1}{\alpha +\beta -2}}$.

Hint: What would you do if you were in Calculus I?

2. Suppose we have a coin that we know comes from a magic-trick store, and therefore we believe that the coin is strongly biased either usually to come up heads or usually to come up tails, but we don’t know which.

Express this belief as a beta prior. That is, find shape parameters that lead to a beta distribution that corresponds to this belief.
Now we flip the coin 5 times and it comes up heads in 4 of the 5 flips. What is the posterior distribution?
Use quick_bern_beta() or a similar function of your own creation to show the prior and posterior graphically.

3. Suppose we estimate a proprtion $\theta$ using a ${\sf Beta}(10, 10)$ prior and a observe 26 successes and 48 failures.

What is the posterior distribution?
What is the mean of the posterior distribution?
What is the mode of the posterior distribution?
qbeta(c(0.025, 0.975), shape1 = ??, shape2 = ??) will create a 95% interval for a beta distribution, but it usually won’t be the highest density interval. (xqbeta() will compute the interval and make a plot.)
Compute this interval for your posterior distribution.
Which direction do you need to move each end of the interval to get the HDI? Explain how you know.

4. In this problem we will explore the HDI for a beta distribution a bit more.

Let’s start out with a very concrete situation: a 95% interval for a Beta(10, 10) distribution. Because this distribution is symmetric, the 95% HDI is especially easy, we just need to eliminate 2.5% from each end of the distribution. Use qbeta() to find the 95% HDI.
Now use xqbeta() or xpbeta() to illustrate the interval. [Hint: For xqbeta() you can use c(0.025, 0.975) to do both ends at the same time. You can do a similar thing with xpbeta().]
Now let’s try a Beta(5, 15) distribution. This distribution is not symmetric. We can still create a 95% interval by removing 2.5% on each end, but it won’t be the HDI. Compute this interval, and use xqbeta() to illustrate it. How must the interval be adjusted to become the 95% HDI?
For a non-symmetric distribution, the amount removed from the left and right tails to get the HDI will differ, but the total will still be 5%. Pick an amount different from 2.5% to remove from the left tail and determine the corresponding 95% interval. Is your new interval wider or narrower than the previous interval?
We will have the HDI when the height of the beta pdf is the same at each end of the interval. Let $p$ be the proportion of the distribution in the left tail removed by the HDI. We can approximate $p$ using the higher-lower guessing game. At the start, we know that $p \in (0, 0.025]$. Play the game for 4 rounds. After 4 rounds, what interval do you know contains the correct value for $p$?
R can automate playing this game for you using a function called uniroot(). uniroot() tries to find an input that makes a function value be 0, assuming that the function is continuous. To use uniroot() we must provide the function $f$ and an interval $(l, h)$ such that $f(l) < 0$ and $f(h) > 0$ (or $f(l) > 0$ and $f(h) < 0$). uniroot() will play the higher-lower game within this interval.

The natural function to use for our situation is
```
f <- function(p) {
  p2 <- 0.05 - p
  q  <- qbeta(p,  5, 15)
  q2 <- qbeta(1 - p2, 5, 15)
  # print(tibble(p, p2, q, q2))
  dbeta(q, 5, 15) - dbeta(q2, 5, 15)
}
```
1. What is p2?
2. What are q and q2?
3. Why do we have 1 - p2 in the creation of q2?
4. Compute f() on the values of $p$ from your higher-lower guessing game. Do you see how the sign of f() is related to the next guess?
Now use
```
uniroot(f, c(0, 0.05))
```
```
## $root
## [1] 0.013346
## 
## $f.root
## [1] 0.0007323517
## 
## $iter
## [1] 4
## 
## $init.it
## [1] NA
## 
## $estim.prec
## [1] 6.103516e-05
```
to find the 95% HDI. Use xqbeta() or xpbeta() to illustrate.

If you uncomment the print statement, you can see all the guesses made along the way. uniroot() doesn’t always guess the midpoint between the two ends of the interval – it usues information about the function value at the two ends and guesses closer to the one that has a value closer to 0. This usually makes it very efficient.
f() is a pretty generic name. What might have been a better name for this function?

5. Now let’s generalize our previous exercise to handle any beta distribution and any probability for the HDI.

Edit the following code to create a working function.

hdi_beta <- function(shape1, shape2, prob = 0.95) {

  f <- function(p) {
    p2 <- ?? - p
    q  <- qbeta(??,  shape1, shape2)
    q2 <- qbeta(??, shape1, shape2)
    # print(tibble(p, p2, q, q2))
    dbeta(q, shape1, shape2) - dbeta(q2, shape1, shape2)
  }

  lo <- uniroot(f, c(??, ??))$root
  hi <- ??
  c(??, ??)
}

Feel free to rename things like f and lo and hi to make the code clearer. (I’ve left them a bit generic to force you to think through the code a bit more, but I recommend using better names.) You may also add additional lines of code if that makes things clearer.

Test your function on a few examples and demonstrate that it is giving correct values. xpbeta(hdi_beta(...), ...) is one way you can inspect the results.
Use your function to find the following intervals:
1. 95% HDI for Beta(5, 15)
2. 90% HDI for Beta(40, 60)
3. 92% HDI for Beta(60, 40)

6. Suppose a state-wide election is approaching, and you are interested in knowing whether the general population prefers the democrat or the republican. There is a just-published poll in the newspaper, which states that of 100 randomly sampled people, 58 preferred the republican and the remainder preferred the democrat.

Suppose that before the newspaper poll, your prior belief was a uniform distribution. What is the 95% HDI on your beliefs after learning of the newspaper poll results? [Reminder: ${\sf Unif}(0,1) = {\sf Beta}(1,1)$.]
Based on what you know about elections, why is a uniform prior not a great choice? Repeat part (a) with a prior the conforms better to what you know about elections. Explain your choice of prior. How much does the change of prior affect the 95% HDI?
You find another poll conducted by a different news organization In this second poll, 56 of 100 people preferred the republican. Assuming that peoples’ opinions have not changed between polls, what is the 95% HDI on the posterior taking both polls into account. Make it clear which prior you are using.
Based on the data from both polls (and your choice of prior, and assuming public opinion doesn’t change between the time of the polls and election day), what is the posterior probability that the republican will win the election? That is, what proportion of the posterior distribution has the republican winning?

7. The hdi_beta() function described above only works for unimodal beta distributions.

For the purposes of working with Bayesian posteriors, why is this not a major deficiency?
Modify the function to work with other beta distributions that are not unimodal.