Confidence Intervals

General form

\[ \mbox{estimate} \pm \mbox{critical value} \cdot SE \]

Using the general form

To use the general form, we just need to fill in the three slots.

quantity of interest estimand estimate critical value standard error
mean \(\mu\) \(\overline{x}\) \(t_*\) \(\displaystyle \frac{s}{\sqrt{n}}\)
proportion \(p\) \(\hat p\) \(z_*\) \(\displaystyle \sqrt{\frac{\hat p (1-\hat p)}{n}}\)

If you get in the habit of starting from the general form and then filling in the three slots, it will help you keep things straight (and prepare you for some things yet to come).

  • Identify the correct row

  • Stay in that row

Confidence interval for a difference in means.

Suppse we want to estimate how different (on average) two groups are. For example, we want to estiamte the difference between the average male height and the average female height. Here are some (older) data from a study done by Galton.

df_stats(height ~ sex, data = Galton)
##   response sex min   Q1 median   Q3  max mean   sd   n missing
## 1   height   F  56 62.5   64.0 65.5 70.5 64.1 2.37 433       0
## 2   height   M  60 67.5   69.2 71.0 79.0 69.2 2.63 465       0
gf_boxplot(height ~ sex, data = Galton) |
gf_violin(height ~ sex, data = Galton)

It’s pretty clear that the men are taller on average. The difference in means (in our sample) is a bit over 5 inches.

diffmean( height ~ sex, data = Galton)
## diffmean 
##     5.12

But how accurate is that estimate? We would like a confidence interval for it.

Using our general form

\[ \mbox{estimate} \pm \mbox{critical value} \cdot SE \]

  • the estimate is easy, we already have that
  • the critical value and standard error will depend on the shape of the sampling distribution

So what do we know about the sampling distribution for \(\overline X_1 - \overline X_2\)?

  • \(\displaystyle \overline{X_1} \approx {\sf Norm}\left(\mu_1, \frac{\sigma_1}{\sqrt{n_1}}\right)\)

  • \(\displaystyle \overline{X_2} \approx {\sf Norm}\left(\mu_2, \frac{\sigma_2}{\sqrt{n_2}}\right)\)

  • So \(\overline{X_1} - \overline{X_2}\)

    • will be approximatley normal (with the usual assumptions about sample size, etc.)

    • will have a mean of \(\mu_1 - \mu_2\)

    • will have a variance \(\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}\).

    • will have a standard deviation of \(SE = \sqrt{\frac{\sigma^2_1}{n_1} + \frac{\sigma_2^2}{n_2}}\).

      • \(SE\) because this is the standard deviation of a sampling distribution
  • Putting this all togther we have

\[\begin{align*} \overline{X_1} - \overline{X_2} & \approx {\sf Norm}\left(\mu_1 - \mu_2, \sqrt{\frac{\sigma^2_1}{n_1} + \frac{\sigma_2^2}{n_2}}\right) \\ \frac{(\overline{X_1} - \overline{X_2}) - (\mu_1 - \mu_2)}{ \sqrt{\frac{\sigma^2_1}{n_1} + \frac{\sigma_2^2}{n_2}}} \approx {\sf Norm}(0, 1) \end{align*}\]


Unfortunately, we don’t know \(\sigma_1\) and \(\sigma_2\), so we will approximate them with \(s_1\) and \(s_2\). It can be shown that

\[ \frac{(\overline{X_1} - \overline{X_2}) - (\mu_1 - \mu_2)}{ \sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}} \approx {\sf T}(\mathrm{df} = ?) \]

  • \(df\) is given by a complicated formula, but is always satisfies

\[ \min(\mathrm{df}_1, \mathrm{df}_2) \le \mathrm{df} \le \mathrm{df}_1 + \mathrm{df}_2 \]

  • df will be closer to the high end when the sample sizes and standard deviations are very similar and closer to the low end when they are quite different.


df_stats(height ~ sex, data = Galton)
##   response sex min   Q1 median   Q3  max mean   sd   n missing
## 1   height   F  56 62.5   64.0 65.5 70.5 64.1 2.37 433       0
## 2   height   M  60 67.5   69.2 71.0 79.0 69.2 2.63 465       0
estimate <- 69.2 - 64.1; estimate 
## [1] 5.1
SE <- sqrt( 2.37^2 / 433 + 2.63^2 / 465); SE
## [1] 0.167
t.star1 <- qt(0.975, df = 432); t.star1
## [1] 1.97
t.star2 <- qt(0.975, df = 432 + 464); t.star2
## [1] 1.96

In this case, the value of \(t_*\) is nearly the same at both ends of the range. When they are more different, when computing by hand, use the larger value of \(t_*\) (the one that comes from the smaller degrees of freedom). That will give an interval that might be a bit wider than necessary, but that is better than giving an interval that is too narrow and exagerates the quality of our estimate.

So our interval is

estimate + c(-1,1) * t.star1 * SE
## [1] 4.77 5.43


Of course, R can do all the work for us. Here is the 95% confidence interval for the difference between the mean height of men and the mean height of women (in Galton’s era).

t.test(height ~ sex, data = Galton) %>% confint()
##   mean in group F mean in group M lower upper level
## 1            64.1            69.2 -5.45 -4.79  0.95

Notice that the degrees of freedom is not an integer and is close to \(432 + 464 = 896\) because the sample sizes and standard deviations are quite similar for men and for wemen:

t.test(height ~ sex, data = Galton) 
## 
##  Welch Two Sample t-test
## 
## data:  height by sex
## t = -31, df = 895, p-value <2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -5.45 -4.79
## sample estimates:
## mean in group F mean in group M 
##            64.1            69.2


quantity of interest estimand estimate critical value standard error
mean \(\mu\) \(\overline{x}\) \(t_*\) \(\displaystyle \frac{\sigma}{\sqrt{n}}\)
proportion \(p\) \(\hat p\) \(z_*\) \(\displaystyle \sqrt{\frac{\hat p (1-\hat p)}{n}}\)
difference in means \(\mu_1 - \mu_2\) \(\overline{x_1} - \overline{x_2}\) \(t_*\) \(\displaystyle \sqrt{ \frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}\)

Some Practice

1. Add another row to the table for a difference between two proportions. Follow the same approach we used for the difference between two means.

\[\begin{align*} SE(p_1) &= \sqrt{ \frac{p_1(1 - p_1)}{n_1} } \\ SE(p_2) &= \sqrt{ \frac{p_2(1 - p_2)}{n_2} } \\ SE(p_1 - p_2)^2 &= \frac{p_2(1 - p_2)}{n_1} + \frac{p_2(1 - p_2)}{n_2} \\ SE(p_1 - p_2) &= \sqrt{ \frac{p_1(1 - p_1)}{n_1} + \frac{p_2(1 - p_2)}{n_2}} \end{align*}\]

So the expanded table is

quantity of interest estimand estimate critical value standard error
mean \(\mu\) \(\overline{x}\) \(t_*\) \(\displaystyle \frac{\sigma}{\sqrt{n}}\)
proportion \(p\) \(\hat p\) \(z_*\) \(\displaystyle \sqrt{\frac{\hat p (1-\hat p)}{n}}\)
difference in means \(\mu_1 - \mu_2\) \(\overline{x_1} - \overline{x_2}\) \(t_*\) \(\displaystyle \sqrt{ \frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}\)
difference in proportions \(p_1 - p_2\) \(\hat{p_1} - \hat{p_2}\) \(z_*\) \(\displaystyle \sqrt{ \frac{\hat p_1(1 - \hat p_1)}{n_1} + \frac{\hat p_2(1 - \hat p_2)}{n_2}}\)


2. One technology proposed to rehabiliate aging municipal pipeline networks involves threading a flexible liner through existing pipe. Two processes are considered for this (with and without fusion), and the tensile strength (psi) was measured for a sample of pipes prepared both ways. Here is a summary of the data.

response type min Q1 median Q3 max mean sd n missing
strength fused 2889 2908 3076 3312 3359 3108 205.9 8 0
strength nofusion 2511 2712 2788 3197 3257 2903 277.3 10 0

Compute a 95% confidence interval for the difference in mean tensile strength for the two methods.

You can access these data using

Liner <- Devore7::xmp09.07

Check your work using t.test(). Note: The answers won’t match exactly because your hand-calculation is uing a cruder approximation for the degrees of freedom.

Note: these are pretty small samples, so we need to worry about the shapes of the population distributions. It’s hard to assess shape with so little data, but create normal-quantile plots for the two types. Do you see any causes for alarm or do things look OK?


3. A study was conducted to assess the value of adding radiation therapy to chemotherapy for a particular kind of breast cancer. Here is a summary of the data.

therapy number of women in study number of women who survived 15 years
chemotherapy only 154 76
chemotherapy + radiation 164 98

Compute a 90% confidence interval and a 98% confidence interval for the difference in 15-year survival rates. Is there evidence that one treatment method is better than the other? Explain.

Again, we can let R do all of the work for us. Here is a fancy way to put three intervals into a single table.

bind_rows(
  prop.test( c(76, 98), c(154, 164), conf.level = 0.90) %>% confint(),
  prop.test( c(76, 98), c(154, 164), conf.level = 0.95) %>% confint(),
  prop.test( c(76, 98), c(154, 164), conf.level = 0.98) %>% confint()
) %>% pander()
prop 1 prop 2 lower upper level
0.4935 0.5976 -0.2018 -0.006333 0.9
0.4935 0.5976 -0.2193 0.01118 0.95
0.4935 0.5976 -0.2397 0.03155 0.98


4. Stress limits (in MPa) were measured for samples of red oak and douglas fur lumber. Here is a summary of the data.

type n mean sd
oak 14 8.48 0.79
fir 10 6.65 1.28

Compute a 95% confidence interval for the difference in average stress limit for the two types of lumber. (Since we don’t have access to the raw data, we will have to assume that the populations are reasonably close to normal.)


5. The following data set contains data from an experiment to see whether medical professionals could access images from a digital medical data base (via a web front end) faster than from a library of slides. Thirteen subjects were given both kinds of retrieval tasks and their time in seconds was recorded.

MedicalImages <- Devore7::xmp09.10
head(MedicalImages) %>% pander()
slide digital
30 25
35 16
40 15
25 15
20 10
30 20

In this situation, the samples of times are probably NOT independent since it is likely that some people are faster at both tasks and others are slower. So we can’t use the method we have been using to compare two means. But there is another way.

We can compute the difference person by person and then create a confidence interval for those differences. This is called a paired design becuase we have pairs of related measurements and we are interested in their difference. Here’s how to create the new variable.

MedicalImages <-
  MedicalImages %>%
  mutate(diff = slide - digital) %>%
  mutate(ratio = slide / digital)
head(MedicalImages) %>% pander()
slide digital diff ratio
30 25 5 1.2
35 16 19 2.188
40 15 25 2.667
25 15 10 1.667
20 10 10 2
30 20 10 1.5
  1. Use this new variable to compute a 95% confidence interval for the average difference. (Let R do the work.)

  2. Repeat this with ratios rather than differences.

  3. Which should we use, ratios or differences? One way to judge would be to look to see which of the two has a more nearly normal distribution. Create normal-quantile plots of each. Does one of the two methods look like the better option in this case? (Note: you can add a guideline to your QQ plots with gf_qqline(). The default for this connects the 25th and 75th percentiles, which can sometimes be a bit misleading, so use with caution. You can change this default with, for example line.p = c(0.1, 0.9).)

  4. What do you conclude? Is one of these methods faster?

Here’s a fancy version of the two normal-quantile plots, illustrating the use of line.p and placing them side-by-side using the patchwork package.

library(patchwork)
# | between two plots puts them side by side
gf_qq(~ratio, data = MedicalImages) %>% gf_qqline( line.p = c(0.15, 0.85)) | 
  gf_qq(~diff, data = MedicalImages) %>% gf_qqline(line.p = c(0.15, 0.85))


6. Want more practice? Go back and compute the confidence intervals in the previous problem by hand. (Use df_stats() to create a numerical summary, then do the rest of the computation yourself.) You should get the same results as you got using t.test().


7. A group of middle school kids collected dimes as a fund raiser. Before bringing the dimes to the bank, they wanted to estimate the number of dimes they had collected. Devise method to make this estimate. There is more than one way to do this, but stick to things that middle school kids could reasonably manage (perhaps with a little help from you, a friendly engineering student).