The example below help us explore power in the context of a (2-sided) binomial test.
What the null alternative is false, we would like to reject it based on our data. Whether we do that will depend on several things:
the null hypothesis value of our parameter(s) (\(\theta_0\))
the significance level (\(\alpha\))
We will reject if p-value \(< \alpha\).
the sample size (\(n\))
Generally, more data makes it easier to reject a false null hypothesis.
a specific alternative value of \(\theta\) (\(\theta_a\))
If the null hypothesis is “just barely false”, it will be harder to reject than if it is “nowhere close to the truth”.
Power is the probability of rejecting the null hypothesis given these four values (\(\alpha\), \(n\), \(\theta\), and \(\theta_a\)). This means that power is really a function of these four inputs. If we want a number, we must specify all four. Often we want to see how power depends on one or more of these while holding the others fixed.
That is, determine which values of the test statistic will give us p-values below \(\alpha\). This calculation is made assuming \(H_0\) is true (because that’s how p-values are calculated).
Note: This means depends on \(\theta_0\), \(n\), and \(\alpha\), but not on \(\theta\).
This probability is calculated assuming \(\theta = \theta_a\), a particular alternative value of \(\theta\).
We start by defining a function that creates a useful data set. For each possible value \(x\) of \(X\) where \(X \sim {\sf Binom}(n, \pi_0)\), we compute \(\mathrm{P}(X = x)\) under both the null and alternative hypothesis. From this we can derive the rejection region, power, and true value of \(\alpha\).
There are couple tricky bits here so this can handle values of \(\pi_0\) that are different from = \(0.50\). In this case the distribution is not symmetric, so determining the edge of the “other side” is a little bit tricky. Worse, we need to worry about potentially tie values among the probabilities. A custom function (leq_sum()
) handles that bit. (Don’t worry if you don’t fully understand how it works, but read the comments to see what it is supposed to be doing.)
Here goes.
library(manipulate)
library(fastR2)
# For each value of x, this computes the sum of all values x that are
# <= the given value
leq_sum <- function(x) {
apply(
outer(x, x, function(x, y) x * (x <= y)),
2, sum
)
}
# an example
leq_sum(c(1, 2, 3, 4, 3, 2, 1))
## [1] 2 6 12 16 12 6 2
binom_power_data <-
function(n, p_alt, alpha = 0.05, p_null = 0.5) {
tibble(
x = 0:n, # possible value of x
null_prob = dbinom(x, n, p_null),
alt_prob = dbinom(x, n, p_alt)
) %>%
mutate(
unusual_prob = leq_sum(null_prob), # running total
reject = ifelse(unusual_prob <= alpha, "yes", "no")
)
}
binom_power_data(10, 0.6) %>% pander::pander()
x | null_prob | alt_prob | unusual_prob | reject |
---|---|---|---|---|
0 | 0.0009766 | 0.0001049 | 0.001953 | yes |
1 | 0.009766 | 0.001573 | 0.02148 | yes |
2 | 0.04395 | 0.01062 | 0.1094 | no |
3 | 0.1172 | 0.04247 | 0.3438 | no |
4 | 0.2051 | 0.1115 | 0.7539 | no |
5 | 0.2461 | 0.2007 | 1 | no |
6 | 0.2051 | 0.2508 | 0.7539 | no |
7 | 0.1172 | 0.215 | 0.3438 | no |
8 | 0.04395 | 0.1209 | 0.1094 | no |
9 | 0.009766 | 0.04031 | 0.02148 | yes |
10 | 0.0009766 | 0.006047 | 0.001953 | yes |
binom_power_plot <-
function(n, p_alt, alpha = 0.05, p_null = 0.5) {
BPD <- binom_power_data(n = n, p_alt = p_alt, alpha = alpha, p_null = p_null)
BPDnull <- BPD %>% mutate(case = factor("null", levels = c("null", "alt")))
BPDalt <- BPD %>% mutate(case = factor("alt", levels = c("null", "alt")))
true_alpha <- sum( ~ null_prob, data = BPD %>% filter(reject == "yes"))
power <- sum( ~ alt_prob, data = BPD %>% filter(reject == "yes"))
gf_point(null_prob ~ x, color = ~ reject, alpha = 0.7,
data = BPDnull) %>%
gf_segment(null_prob + 0 ~ x + x, color = ~ reject, alpha = 0.7,
data = BPDnull) %>%
gf_point(alt_prob ~ x, color = ~ reject, alpha = 0.7,
data = BPDalt) %>%
gf_segment(alt_prob + 0 ~ x + x, color = ~ reject, alpha = 0.7,
data = BPDalt) %>%
gf_facet_grid( case ~ .) %>%
gf_lims(
x = range(c(
n * p_null + c(-1, 1) * sqrt(n * p_null * (1-p_null)) * 4,
n * p_alt + c(-1, 1) * sqrt(n * p_alt * (1-p_alt)) * 4)
)
) %>%
gf_refine(scale_color_manual(values = c("lightsteelblue", "red"))) %>%
gf_labs(
y = "", x = "number of observed successes",
caption = paste("nominal alpha = ", format(round(alpha, 3)),
"; true alpha = ", format(round(true_alpha, 3)),
"; power = ", format(round(power, 3))))
}
Here are a few examples.
binom_power_plot(n = 200, p_null = 0.5, p_alt = 0.55) # from HW
binom_power_plot(n = 400, p_null = 0.5, p_alt = 0.55) # from HW
binom_power_plot(n = 50, p_null = 1/6, p_alt = 1/3) # hugo
You can use this in an interactive session in RStudio.
library(manipulate)
manipulate(plot(1:10), x = slider(5, 10)) # this is just to work around a bug in RStudio
manipulate(
binom_power_plot(n = N, p_null = P_NULL, p_alt = P_ALT),
N = manipulate::slider(50, 2500, 100, label = "n", step = 10),
P_NULL = manipulate::slider(0, 1, 0.5, step = 0.01, label = "pi_0"),
P_ALT = manipulate::slider(0, 1, 0.5, step = 0.01, label = "pi_alt")
)
We can use the same data to compute the power as a probability. We simply sum over all of the appropriate values of alt_prob
.
binom_power <-
function(n, p_alt, alpha = 0.05, p_null = 0.50, result = c("power", "true_alpha")) {
result <- match.arg(result) # allows for abbreviation
Power_data <-
binom_power_data(n = n, p_alt = p_alt, alpha = alpha, p_null = p_null) %>%
filter(reject == "yes")
if (result == "power") {
sum( ~ alt_prob, data = Power_data)
} else {
sum( ~ null_prob, data = Power_data)
}
}
Now can easily compute power of any combination of n
, null_prob
, alt_prob
and alpha
tha we are interested in.
binom_power(100, p_alt = 0.40)
## [1] 0.4620934
binom_power(260, p_alt = 0.40)
## [1] 0.885118
binom_power(100, p_null = 0.10, p_alt = 0.20)
## [1] 0.8076661
binom_power(100, p_null = 0.10, p_alt = 0.20, result = "true_alpha")
## [1] 0.04430989
Here is a fancier version of that can compute multiple power values all at once.
binom_power2 <- Vectorize(binom_power)
binom_power2(n = 100:110, p_alt = 0.40, alpha = 0.05, p_null = 0.50)
## [1] 0.4620934 0.5108346 0.4785745 0.5267357 0.4947125 0.5422692 0.5105669
## [8] 0.5574599 0.5261023 0.4948286 0.5413096
Still fancier, let’s return a data frame that keeps track of everything in a nice format.
binom_power3 <-
function(n, p_alt, alpha = 0.05, p_null = 0.50) {
expand.grid(n = n, p_alt = p_alt, alpha = alpha, p_null = p_null) %>%
mutate(
power = binom_power2(n, p_alt, alpha = alpha, p_null = p_null),
true_alpha = binom_power2(n, p_alt, alpha = alpha, p_null = p_null,
result = "true_alpha"))
}
binom_power3 ( n = c(100, 200), p_alt = c(0.40, 0.35), p_null = 0.50,
alpha = c(0.01, 0.05)) %>% arrange(alpha, n, p_alt)
## n p_alt alpha p_null power true_alpha
## 1 100 0.35 0.01 0.5 0.6269243 0.006637121
## 2 100 0.40 0.01 0.5 0.2386118 0.006637121
## 3 200 0.35 0.01 0.5 0.9547297 0.008722501
## 4 200 0.40 0.01 0.5 0.5874587 0.008722501
## 5 100 0.35 0.05 0.5 0.8275851 0.035200200
## 6 100 0.40 0.05 0.5 0.4620934 0.035200200
## 7 200 0.35 0.05 0.5 0.9884358 0.048001466
## 8 200 0.40 0.05 0.5 0.7868487 0.048001466
This makes it easy to generate interesting plots of power. Here we can see that the inability to hit the nominal \(\alpha\) exactly adds “jaggies” to the plot of power vs. sample size. Generally, a larger sample size leads to higher power, but there is some noise to this trend because some combinations of \(\alpha\) and \(\pi_0\) yield true values of \(\alpha\) that are closer to the nominal \(\alpha\).
binom_power3 (
n = 100:250,
p_alt = 0.40,
p_null = 0.50,
alpha = 0.05) %>%
gf_point(power ~ n | p_alt ~ ., color = ~ true_alpha, alpha = 0.5)
binom_power3(
n = c(100, 200, 400),
p_alt = seq(0, 1, by = 0.01),
p_null = 0.50,
alpha = 0.05) %>%
gf_line(power ~ p_alt , color = ~ paste("n =", n),
alpha = 0.5, size = 1.0) %>%
gf_labs(color = "sample size") %>%
gf_refine(
scale_x_continuous(breaks = seq(0, 1, 0.1)),
scale_y_continuous(breaks = seq(0, 1, 0.1))
) %>%
plotly::ggplotly()