The example below help us explore power in the context of a (2-sided) binomial test.

What is power?

What the null alternative is false, we would like to reject it based on our data. Whether we do that will depend on several things:

Power is the probability of rejecting the null hypothesis given these four values (\(\alpha\), \(n\), \(\theta\), and \(\theta_a\)). This means that power is really a function of these four inputs. If we want a number, we must specify all four. Often we want to see how power depends on one or more of these while holding the others fixed.

Steps to computing power

Step 1: Determine the rejection region

That is, determine which values of the test statistic will give us p-values below \(\alpha\). This calculation is made assuming \(H_0\) is true (because that’s how p-values are calculated).

Note: This means depends on \(\theta_0\), \(n\), and \(\alpha\), but not on \(\theta\).

Step 2: Determine the probability of obtaining a test statistic in the rejection region

This probability is calculated assuming \(\theta = \theta_a\), a particular alternative value of \(\theta\).

Example: Power for a Binomial Test

We start by defining a function that creates a useful data set. For each possible value \(x\) of \(X\) where \(X \sim {\sf Binom}(n, \pi_0)\), we compute \(\mathrm{P}(X = x)\) under both the null and alternative hypothesis. From this we can derive the rejection region, power, and true value of \(\alpha\).

There are couple tricky bits here so this can handle values of \(\pi_0\) that are different from = \(0.50\). In this case the distribution is not symmetric, so determining the edge of the “other side” is a little bit tricky. Worse, we need to worry about potentially tie values among the probabilities. A custom function (leq_sum()) handles that bit. (Don’t worry if you don’t fully understand how it works, but read the comments to see what it is supposed to be doing.)

Here goes.

library(manipulate)
library(fastR2)
# For each value of x, this computes the sum of all values x that are 
# <= the given value

leq_sum <- function(x) {
  apply(
    outer(x, x, function(x, y) x * (x <= y)),
    2, sum
  )
}

# an example
leq_sum(c(1, 2, 3, 4, 3, 2, 1))
## [1]  2  6 12 16 12  6  2
binom_power_data <- 
  function(n, p_alt, alpha = 0.05, p_null = 0.5) {
    tibble(
      x = 0:n,            # possible value of x
      null_prob = dbinom(x, n, p_null),  
      alt_prob  = dbinom(x, n, p_alt)
    )  %>% 
      mutate(
        unusual_prob = leq_sum(null_prob), # running total
        reject =  ifelse(unusual_prob <= alpha, "yes", "no") 
      ) 
  }
binom_power_data(10, 0.6) %>% pander::pander()
x null_prob alt_prob unusual_prob reject
0 0.0009766 0.0001049 0.001953 yes
1 0.009766 0.001573 0.02148 yes
2 0.04395 0.01062 0.1094 no
3 0.1172 0.04247 0.3438 no
4 0.2051 0.1115 0.7539 no
5 0.2461 0.2007 1 no
6 0.2051 0.2508 0.7539 no
7 0.1172 0.215 0.3438 no
8 0.04395 0.1209 0.1094 no
9 0.009766 0.04031 0.02148 yes
10 0.0009766 0.006047 0.001953 yes

Visualizing Power

binom_power_plot <- 
  function(n, p_alt, alpha = 0.05, p_null = 0.5) {
    BPD <- binom_power_data(n = n, p_alt = p_alt, alpha = alpha, p_null = p_null) 
    BPDnull <- BPD %>% mutate(case = factor("null", levels = c("null", "alt")))
    BPDalt  <- BPD %>% mutate(case = factor("alt",  levels = c("null", "alt")))
    true_alpha <- sum( ~ null_prob, data = BPD %>% filter(reject == "yes"))
    power      <- sum( ~ alt_prob,  data = BPD %>% filter(reject == "yes"))
    
    gf_point(null_prob ~ x, color = ~ reject, alpha = 0.7, 
             data = BPDnull) %>%
      gf_segment(null_prob + 0 ~ x + x, color = ~ reject, alpha = 0.7, 
                 data = BPDnull) %>%
      gf_point(alt_prob ~ x, color = ~ reject, alpha = 0.7, 
               data = BPDalt) %>%
      gf_segment(alt_prob + 0 ~ x + x, color = ~ reject, alpha = 0.7, 
                 data = BPDalt) %>%
      gf_facet_grid( case ~ .) %>%
      gf_lims(
        x = range(c(
          n * p_null + c(-1, 1) * sqrt(n * p_null * (1-p_null)) * 4,
          n * p_alt + c(-1, 1) * sqrt(n * p_alt * (1-p_alt)) * 4)
        )
      ) %>%
      gf_refine(scale_color_manual(values = c("lightsteelblue", "red"))) %>%
      gf_labs(
        y = "", x = "number of observed successes",
        caption = paste("nominal alpha = ", format(round(alpha, 3)),
                        "; true alpha = ",    format(round(true_alpha, 3)), 
                        "; power = ",       format(round(power, 3))))
  }

Here are a few examples.

binom_power_plot(n = 200, p_null = 0.5, p_alt = 0.55)  # from HW

binom_power_plot(n = 400, p_null = 0.5, p_alt = 0.55)  # from HW

binom_power_plot(n = 50, p_null = 1/6, p_alt = 1/3)    # hugo

Interactive version

You can use this in an interactive session in RStudio.

library(manipulate)
manipulate(plot(1:10), x = slider(5, 10))  # this is just to work around a bug in RStudio
manipulate(
  binom_power_plot(n = N, p_null = P_NULL, p_alt = P_ALT),
  N = manipulate::slider(50, 2500, 100, label = "n", step = 10),
  P_NULL = manipulate::slider(0, 1, 0.5, step = 0.01, label = "pi_0"),
  P_ALT  = manipulate::slider(0, 1, 0.5, step = 0.01, label = "pi_alt")
)

Just give me the power, please

We can use the same data to compute the power as a probability. We simply sum over all of the appropriate values of alt_prob.

binom_power <- 
  function(n, p_alt, alpha = 0.05, p_null = 0.50, result = c("power", "true_alpha")) {
    result <- match.arg(result) # allows for abbreviation
    Power_data <- 
      binom_power_data(n = n, p_alt = p_alt, alpha = alpha, p_null = p_null) %>%
      filter(reject == "yes")
    if (result == "power") {
      sum( ~ alt_prob, data = Power_data)
    } else {
      sum( ~ null_prob, data = Power_data)
    }
  }

Now can easily compute power of any combination of n, null_prob, alt_prob and alpha tha we are interested in.

binom_power(100, p_alt = 0.40)
## [1] 0.4620934
binom_power(260, p_alt = 0.40)
## [1] 0.885118
binom_power(100, p_null = 0.10, p_alt = 0.20)
## [1] 0.8076661
binom_power(100, p_null = 0.10, p_alt = 0.20, result = "true_alpha")
## [1] 0.04430989

Getting fancy

Here is a fancier version of that can compute multiple power values all at once.

binom_power2 <- Vectorize(binom_power)

binom_power2(n = 100:110, p_alt = 0.40, alpha = 0.05, p_null = 0.50)
##  [1] 0.4620934 0.5108346 0.4785745 0.5267357 0.4947125 0.5422692 0.5105669
##  [8] 0.5574599 0.5261023 0.4948286 0.5413096

Still fancier, let’s return a data frame that keeps track of everything in a nice format.

binom_power3 <- 
  function(n, p_alt, alpha = 0.05, p_null = 0.50) {
    expand.grid(n = n, p_alt = p_alt, alpha = alpha, p_null = p_null) %>%
      mutate(
        power = binom_power2(n, p_alt, alpha = alpha, p_null = p_null),
        true_alpha = binom_power2(n, p_alt, alpha = alpha, p_null = p_null, 
                                  result = "true_alpha"))
  }
binom_power3 ( n = c(100, 200), p_alt = c(0.40, 0.35), p_null = 0.50, 
               alpha = c(0.01, 0.05)) %>% arrange(alpha, n, p_alt)
##     n p_alt alpha p_null     power  true_alpha
## 1 100  0.35  0.01    0.5 0.6269243 0.006637121
## 2 100  0.40  0.01    0.5 0.2386118 0.006637121
## 3 200  0.35  0.01    0.5 0.9547297 0.008722501
## 4 200  0.40  0.01    0.5 0.5874587 0.008722501
## 5 100  0.35  0.05    0.5 0.8275851 0.035200200
## 6 100  0.40  0.05    0.5 0.4620934 0.035200200
## 7 200  0.35  0.05    0.5 0.9884358 0.048001466
## 8 200  0.40  0.05    0.5 0.7868487 0.048001466

More power plots

Power vs sample size

This makes it easy to generate interesting plots of power. Here we can see that the inability to hit the nominal \(\alpha\) exactly adds “jaggies” to the plot of power vs. sample size. Generally, a larger sample size leads to higher power, but there is some noise to this trend because some combinations of \(\alpha\) and \(\pi_0\) yield true values of \(\alpha\) that are closer to the nominal \(\alpha\).

binom_power3 ( 
  n = 100:250,
  p_alt = 0.40, 
  p_null = 0.50, 
  alpha = 0.05) %>%
  gf_point(power ~ n | p_alt ~ ., color = ~ true_alpha, alpha = 0.5)

Power vs alternative

binom_power3( 
  n = c(100, 200, 400),
  p_alt = seq(0, 1, by = 0.01),
  p_null = 0.50, 
  alpha = 0.05) %>%
  gf_line(power ~ p_alt , color = ~ paste("n =", n), 
          alpha = 0.5, size = 1.0) %>%
  gf_labs(color = "sample size") %>%
  gf_refine(
    scale_x_continuous(breaks = seq(0, 1, 0.1)),
    scale_y_continuous(breaks = seq(0, 1, 0.1))
  ) %>%
  plotly::ggplotly()