A statistical procedure is robust if it works reasonably well even when the assumptions on which it is based do no hold (in a particular way).
Our t-based confidnece intervals and p-values are based on iid random sampling from a normal population. We can use simulations to see how much the shape of the population matters.1
It’s a good idea to know what you want the computer to do before you launch into writing the code to do it. So here’s the plan for one simulation:
This involves choosing
If we are checking robustness of a hypothesis test, then the population should be one that makes the null hypothesis true.
In our example, we will want to send the data to t.test()
.
Functions like t.test()
do more than we need. We might keep track of the p-vlaue (using pval()
) or a confidence interval (using confint()
) or something else.
One we have figured out how to do one simulation, we need to do it many times. There are at least two appraoches we could use for this.
Now that we have our plan outlined we need to implment it. Let’s illustrate by seeing how well the t confidence intervals work when the population is normal (should be great) and when the popoulation is exponential (might not be as good since an exponential distribution is not very much like a normal distribution.
We’ll outline two approaches. But they start the same: figuring out how to code one simulation. Let’s build up bit by bit.
Create one random sample:
set.seed(314159)
rexp(15, rate = 1 / 314)
## [1] 485.93566 420.40604 114.10714 253.54126 337.69546 297.51560
## [7] 182.05810 443.91215 661.68366 1420.42388 119.09389 1068.77085
## [13] 34.63898 10.12708 293.42572
T-test for one sample:
set.seed(314159)
t.test(rexp(15, rate = 1 / 314))
##
## One Sample t-test
##
## data: rexp(15, rate = 1/314)
## t = 4.0895, df = 14, p-value = 0.001104
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## 194.7616 624.3498
## sample estimates:
## mean of x
## 409.5557
Extract confidence interval or p-value:
set.seed(314159)
confint(t.test(rexp(15, rate = 1 / 314)))
## mean of x lower upper level
## 1 409.5557 194.7616 624.3498 0.95
set.seed(314159)
pval(t.test(rexp(15, rate = 1 / 314)))
## p.value
## 0.001104466
This is already a one-liner, but sometimes it is nice to create a custom function.
exp_interval <- function(n, mean) {
CI <- confint(t.test(rexp(n, rate = 1 / mean)))
}
set.seed(314159)
exp_interval(15, 314)
set.seed(314159)
SimsExp <- do(10000) * confint(t.test(rexp(15, rate = 1 / 314)))
head(SimsExp)
## mean.of.x lower upper level .row .index
## 1 385.3649 163.7523 606.9776 0.95 1 1
## 2 343.7076 176.2650 511.1502 0.95 1 2
## 3 254.6690 122.4845 386.8536 0.95 1 3
## 4 356.4574 166.2427 546.6721 0.95 1 4
## 5 436.1487 183.8851 688.4124 0.95 1 5
## 6 275.0616 140.5288 409.5944 0.95 1 6
Sims2Exp <-
SimsExp |>
mutate(
cover = case_when(
upper < 314 ~ " too low",
lower > 314 ~ "too high",
TRUE ~ "just right"
)
)
head(Sims2Exp)
## mean.of.x lower upper level .row .index cover
## 1 385.3649 163.7523 606.9776 0.95 1 1 just right
## 2 343.7076 176.2650 511.1502 0.95 1 2 just right
## 3 254.6690 122.4845 386.8536 0.95 1 3 just right
## 4 356.4574 166.2427 546.6721 0.95 1 4 just right
## 5 436.1487 183.8851 688.4124 0.95 1 5 just right
## 6 275.0616 140.5288 409.5944 0.95 1 6 just right
tally( ~ cover, data = Sims2Exp)
## cover
## too low just right too high
## 846 9117 37
This approach has the advantage of making it easy to run several variations on our simulation all at once.
Create a grid:
expand.grid(rep = 1:3, n = c(15, 60))
## rep n
## 1 1 15
## 2 2 15
## 3 3 15
## 4 1 60
## 5 2 60
## 6 3 60
Add in the results of the simulation (and add many more reps, and a couple more sample sizes):
set.seed(314159)
SimsExpGrid <-
expand.grid(rep = 1:10000, n = c(15, 30, 60, 100)) |>
group_by(rep, n) |>
mutate(exp_interval(n, mean = 134)) # using our custom function
head(SimsExpGrid)
## # A tibble: 6 × 6
## # Groups: rep, n [6]
## rep n `mean of x` lower upper level
## <int> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 15 175. 83.1 266. 0.95
## 2 2 15 117. 74.8 160. 0.95
## 3 3 15 83.7 58.5 109. 0.95
## 4 4 15 142. 79.7 204. 0.95
## 5 5 15 119. 46.6 192. 0.95
## 6 6 15 119. 71.5 167. 0.95
Note: group_by()
causes mutate()
to be applied separately sub-data frames determined by rep
and n
. It is important that rep
and n
together uniquely identify a single row of the grid. If that were not the case, we would need to something more complicated in the mutate command to control how vectorization is used. If we adopted one of those more complicated approaches, we wouldn’t need the group_by()
here.2
As before, we can add in some additional information:
SimsExpGrid2 <-
SimsExpGrid |>
ungroup() |>
mutate(cover =
case_when(
upper < 314 ~ " too low",
lower > 314 ~ "too high",
TRUE ~ "just right"
)
)
Alternatively, we could have added this extra information into our custom function. That would be especially useful if we were doing a lot of similar situations since we would only need to include that code in one place.
Now let’s look at the results:
tally( cover ~ n, data = SimsExpGrid2)
## n
## cover 15 30 60 100
## too low 9634 9992 10000 10000
## just right 366 8 0 0
We see that things work better as the sample size increases, but clearly things don’t look as good as they did when sampling from a normal population. Also, the misses are not symmetrical, we are more likely to miss low than to miss high.
We could do this with pvalues instead.
set.seed(12345)
PvalSims <-
expand.grid(rep = 1:10000, n = c(15, 30, 60, 100)) |>
group_by(rep, n) |>
mutate(pval = pval(t.test(rexp(n, rate = 1/314), mu = 314)))
head(PvalSims)
## # A tibble: 6 × 3
## # Groups: rep, n [6]
## rep n pval
## <int> <dbl> <dbl>
## 1 1 15 0.984
## 2 2 15 0.150
## 3 3 15 0.852
## 4 4 15 0.326
## 5 5 15 0.366
## 6 6 15 0.0580
gf_qq( ~ pval | n, data = PvalSims, distribution = qunif) |>
gf_abline(slope = ~1, intercept = ~0, color = "red")
# zooming on the most interesting part of the distribution (small p-values)
gf_qq( ~ pval | n, data = PvalSims, distribution = qunif) |>
gf_abline(slope = ~1, intercept = ~0, color = "red") |>
gf_lims(x = c(0, 0.15), y = c(0, 0.15))
## Warning: Removed 34000 rows containing missing values (geom_point).
We see from this simulation that the small p-values are too small, especially for the smaller sample sizes. So using this method would cause us to reject the null too readily.
The alternative would be to work out the distribution theoretically. In some situations we can do this, but it is generally a lot more work, and things don’t always work out as nicely as they did when we assumed we had an iid random sample from a Norm(\(\mu, \sigma\)) population.↩︎
If you are curious to know how else this might have been done, look at the sapply()
function and the functions in the purrr
packages.↩︎