Robustness

A statistical procedure is robust if it works reasonably well even when the assumptions on which it is based do no hold (in a particular way).

Our t-based confidnece intervals and p-values are based on iid random sampling from a normal population. We can use simulations to see how much the shape of the population matters.1

Creating a simulation in R

Step 1: Have a plan

It’s a good idea to know what you want the computer to do before you launch into writing the code to do it. So here’s the plan for one simulation:

1a. Generate a random sample.

This involves choosing

  • sample size (\(n\))
  • the distribution to sample from (family and parameters)

If we are checking robustness of a hypothesis test, then the population should be one that makes the null hypothesis true.

1b. Apply the statistical procedure of interest to the random sample.

In our example, we will want to send the data to t.test().

1c. Extract the information we want to keep track of.

Functions like t.test() do more than we need. We might keep track of the p-vlaue (using pval()) or a confidence interval (using confint()) or something else.

1d. Repeat the simulation many times

One we have figured out how to do one simulation, we need to do it many times. There are at least two appraoches we could use for this.

Step 2. Implement the plan

Now that we have our plan outlined we need to implment it. Let’s illustrate by seeing how well the t confidence intervals work when the population is normal (should be great) and when the popoulation is exponential (might not be as good since an exponential distribution is not very much like a normal distribution.

Code one simulation

We’ll outline two approaches. But they start the same: figuring out how to code one simulation. Let’s build up bit by bit.

Create one random sample:

set.seed(314159)
rexp(15, rate = 1 / 314)
##  [1]  485.93566  420.40604  114.10714  253.54126  337.69546  297.51560
##  [7]  182.05810  443.91215  661.68366 1420.42388  119.09389 1068.77085
## [13]   34.63898   10.12708  293.42572

T-test for one sample:

set.seed(314159)
t.test(rexp(15, rate = 1 / 314))
## 
##  One Sample t-test
## 
## data:  rexp(15, rate = 1/314)
## t = 4.0895, df = 14, p-value = 0.001104
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  194.7616 624.3498
## sample estimates:
## mean of x 
##  409.5557

Extract confidence interval or p-value:

set.seed(314159)
confint(t.test(rexp(15, rate = 1 / 314)))
##   mean of x    lower    upper level
## 1  409.5557 194.7616 624.3498  0.95
set.seed(314159)
pval(t.test(rexp(15, rate = 1 / 314)))
##     p.value 
## 0.001104466

This is already a one-liner, but sometimes it is nice to create a custom function.

  • You can add some additional post processsing if you like.
  • Give some thought to the arguments of your function. What elements of the simulation might you change?
exp_interval <- function(n, mean) {
  CI <- confint(t.test(rexp(n, rate = 1 / mean)))
}
set.seed(314159)
exp_interval(15, 314)

Repeat the simulation many times

Using do()

set.seed(314159)
SimsExp <- do(10000) * confint(t.test(rexp(15, rate = 1 / 314)))
head(SimsExp)
##   mean.of.x    lower    upper level .row .index
## 1  385.3649 163.7523 606.9776  0.95    1      1
## 2  343.7076 176.2650 511.1502  0.95    1      2
## 3  254.6690 122.4845 386.8536  0.95    1      3
## 4  356.4574 166.2427 546.6721  0.95    1      4
## 5  436.1487 183.8851 688.4124  0.95    1      5
## 6  275.0616 140.5288 409.5944  0.95    1      6
Sims2Exp <- 
  SimsExp |>
  mutate(
    cover = case_when(
      upper < 314 ~ " too low",
      lower > 314 ~ "too high",
      TRUE ~ "just right"
    )
  )
head(Sims2Exp)
##   mean.of.x    lower    upper level .row .index      cover
## 1  385.3649 163.7523 606.9776  0.95    1      1 just right
## 2  343.7076 176.2650 511.1502  0.95    1      2 just right
## 3  254.6690 122.4845 386.8536  0.95    1      3 just right
## 4  356.4574 166.2427 546.6721  0.95    1      4 just right
## 5  436.1487 183.8851 688.4124  0.95    1      5 just right
## 6  275.0616 140.5288 409.5944  0.95    1      6 just right
tally( ~ cover, data = Sims2Exp)
## cover
##    too low just right   too high 
##        846       9117         37

Using a data frame scaffold

This approach has the advantage of making it easy to run several variations on our simulation all at once.

Create a grid:

expand.grid(rep = 1:3, n = c(15, 60)) 
##   rep  n
## 1   1 15
## 2   2 15
## 3   3 15
## 4   1 60
## 5   2 60
## 6   3 60

Add in the results of the simulation (and add many more reps, and a couple more sample sizes):

set.seed(314159)
SimsExpGrid <-
  expand.grid(rep = 1:10000, n = c(15, 30, 60, 100)) |>
  group_by(rep, n) |>
  mutate(exp_interval(n, mean = 134))  # using our custom function

head(SimsExpGrid)
## # A tibble: 6 × 6
## # Groups:   rep, n [6]
##     rep     n `mean of x` lower upper level
##   <int> <dbl>       <dbl> <dbl> <dbl> <dbl>
## 1     1    15       175.   83.1  266.  0.95
## 2     2    15       117.   74.8  160.  0.95
## 3     3    15        83.7  58.5  109.  0.95
## 4     4    15       142.   79.7  204.  0.95
## 5     5    15       119.   46.6  192.  0.95
## 6     6    15       119.   71.5  167.  0.95

Note: group_by() causes mutate() to be applied separately sub-data frames determined by rep and n. It is important that rep and n together uniquely identify a single row of the grid. If that were not the case, we would need to something more complicated in the mutate command to control how vectorization is used. If we adopted one of those more complicated approaches, we wouldn’t need the group_by() here.2

As before, we can add in some additional information:

SimsExpGrid2 <-
  SimsExpGrid |>
  ungroup() |>
  mutate(cover = 
           case_when(
             upper < 314 ~ " too low",
             lower > 314 ~ "too high",
             TRUE ~ "just right"
           )
         ) 

Alternatively, we could have added this extra information into our custom function. That would be especially useful if we were doing a lot of similar situations since we would only need to include that code in one place.

Now let’s look at the results:

tally( cover ~ n, data = SimsExpGrid2)
##             n
## cover           15    30    60   100
##    too low    9634  9992 10000 10000
##   just right   366     8     0     0

We see that things work better as the sample size increases, but clearly things don’t look as good as they did when sampling from a normal population. Also, the misses are not symmetrical, we are more likely to miss low than to miss high.

P-values

We could do this with pvalues instead.

set.seed(12345)
PvalSims <-
  expand.grid(rep = 1:10000, n = c(15, 30, 60, 100)) |>
  group_by(rep, n) |>
  mutate(pval = pval(t.test(rexp(n, rate = 1/314), mu = 314)))

head(PvalSims)
## # A tibble: 6 × 3
## # Groups:   rep, n [6]
##     rep     n   pval
##   <int> <dbl>  <dbl>
## 1     1    15 0.984 
## 2     2    15 0.150 
## 3     3    15 0.852 
## 4     4    15 0.326 
## 5     5    15 0.366 
## 6     6    15 0.0580
gf_qq( ~ pval | n, data = PvalSims, distribution = qunif) |>
  gf_abline(slope = ~1, intercept = ~0, color = "red") 

# zooming on the most interesting part of the distribution (small p-values)
gf_qq( ~ pval | n, data = PvalSims, distribution = qunif) |>
  gf_abline(slope = ~1, intercept = ~0, color = "red") |>
  gf_lims(x = c(0, 0.15), y = c(0, 0.15))
## Warning: Removed 34000 rows containing missing values (geom_point).

We see from this simulation that the small p-values are too small, especially for the smaller sample sizes. So using this method would cause us to reject the null too readily.


  1. The alternative would be to work out the distribution theoretically. In some situations we can do this, but it is generally a lot more work, and things don’t always work out as nicely as they did when we assumed we had an iid random sample from a Norm(\(\mu, \sigma\)) population.↩︎

  2. If you are curious to know how else this might have been done, look at the sapply() function and the functions in the purrr packages.↩︎