Standard Error

What is Standard Error?

Remember that the standard error is the standard deviation of a null distribution or a sampling distribution. If we are using randomization or bootstrap, we can estimate the standard error by simply taking the standard deviation of the resulting distribution. But in many situations, there are formulas for the standard error we can use to avoid simulation.

What standard error does and does not measure

Standard error measures sample to sample variability. This variability explains why samples, even under ideal conditions, don’t match populations exactly, and helps us quantify how much we can learn about the population from our sample. But standard error does not take into account other reasons why our sample might not match the population. For example, standard error does not measure problems like these:

  • The sample is not a representative selection from the population of interest.
    • Individuals in the population were not equally likely to be in the sample
    • Available sample is not quite like the intended population in some way
    • Missing data (including non-response)
  • The individual cases in the sample were not selected independently.
  • The form of the questions asked or the way variables were measured tends to push things in one direction or the other.

Some of these problems can be addressed by more complicated statistical procedures, but the methods we have developed in this course generally assume optimal statistical conditions to be valid.

Four SE formulas

Here are our four SE formulas arranged in a table based on the parameter of interest and the number of groups. When there are two groups, we are interested in the difference in proportions or difference in means.

parameter type one group two groups
proportion \(\displaystyle SE = \sqrt{\frac{p (1-p)}{n}}\) \(\displaystyle SE = \sqrt{ \frac{p_1 (1-p_1)}{n_1} + \frac{p_2 (1-p_2)}{n_2}}\)
mean \(\displaystyle SE = \frac{\sigma}{\sqrt{n}}\) \(\displaystyle SE = \sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}\)

The table above will be provided on Test 3 and on the final exam.

But we don’t know \(p\) and \(\sigma\)!

This is true! But we can estimate them, substituting in a value that comes either from the null hypothesis or from our sample data.

A note on paired designs

When we have a paired design, the first step is to convert two variables into a single variable (usually by substraction, but sometimes the ratio is used instead). After we do that conversion, we are left with a single quantitative variable and we are interested in the mean of that variable. So the paired situation is just a special case (with an extra step) of the situation for one or two means.

Standard Error in Regression

We won’t learn the standard error formulas for the intercept, slope, or predictions of a linear model, but output from statistical software provides those values for us or uses them to do other calculations (like p-values and confidence intervals). The standard error formulas are similar to the ones above (the numerator includes a measure of variability in the population and the denominator includes a measure of sample size) but more complicated.

Chi-Squared is the odd one out

Chi-squared (goodness of fit and 2-way tables) is a bit different from the other situations we have covered.

  • The emphasis is on a hypothesis test (we didn’t talk about confidence intervals in this context)
  • We did not do anything with standard errors. (In ANOVA, standard errors are involved in doing Tukey’s Honest Significant Differences, but we let the computer do all the work for us.)

The Chi-squared test statistic can be computed from a table of counts using

\[X^2 = \sum \frac{(\mathrm{observed} - \mathrm{expected})^2}{\mathrm{expected}} \;.\]

  • Expected cell counts are determined by the null hypothesis.
    • Goodness of Fit: \(\mathrm{expected}_i = p_i n\).
    • 2-way tables: \(\displaystyle \mathrm{expected} = \frac{\mathrm{row\ total \ \cdot \ column\ total}}{\mathrm{grand\ total}}\)

Design a study

In each scenario below,

Sometimes there may be more than one way to design the study, but don’t design a poor study when a better option is available.

  1. You want to know what proportion of Calvin students got a flu shot this year.

  2. You want to know whether male students or female students were more likely to get a flu shot this year.

  3. You want to know which of three diet plans is most effective at helping people lose weight.

  4. You want to know whether rhubarb grows faster or slower if you cover it with a bucket for 3 weeks.

  5. You want to know whether people can swim faster if they wear wet suits.

  6. You want to know if there is an association between education level and smoking.

Make up additional examples for any of the scenarios we have covered in class but were not used abovce. (You can also make up additional examples like these just to practice identifying the correct analysis method for a study.)

Practice with the formulas

  1. Write down the word equation for a test statistic based on a standard error and a normal or t distribution.

  2. Write down the word equation for a confidence interval based on a standard error and a normal or t distribution.

In each of the situations in problems 9 and 10, compute a p-value or confidence interval (or both).

  1. In a study to compare the endurance of male and female mice, mice were made to swim in a bucket with a weight attached to their tail and rescued when they became exhausted. The table below gives the some information about the distribution of these “times to exhaustion” (in minutes).

    sex n mean sd
    female 162 11.4 26.09
    male 135 6.7 6.69
    1. Both distributions (for females and males) were skewed. Which direction do you think they were skewed? Why?
    2. Why is OK to use our SE method for this situation even though the sample distributions are skewed?
    3. Give an 98% confidence interval for the mean endurance for female mice.
    4. Give an 98% confidence interval for the mean endurance for male mice.
    5. Give an 98% confidence interval for the difference in the mean endurance for female and male mice.
    6. Is there evidence that endurance varies by sex? How strong is the evidence?

  2. Use the data in StudentSurvey (from Lock5withR) to answer the following questions. In each case, some summary output is provided. That should be all you need.

    (This is a sample of students from one particular university, so we can only generalize results to students at that university or perhaps to students at “similar universities” – and then only if the sample was reasonably representative.)

    1. Do men exercise more than women? How much more?

      response Sex min Q1 median Q3 max mean sd n missing
      Exercise Female 0 4 7 12 27 8.11 5.199 168 1
      Exercise Male 0 5 10 14 40 9.876 6.069 193 0
    2. Are men more likely to be smokers than women? If so, how much more likely?

      response Sex prop_No prop_Yes n
      Smoke Female 0.9053 0.09467 169
      Smoke Male 0.8601 0.1399 193
    3. Give a 95% confidence interval for the slope of a regression of weight on height for the men based ond this study. How would you interpret this slope?

      Fitting linear model: Weight ~ Height
        Estimate Std. Error t value Pr(>|t|)
      (Intercept) -69.06 44.17 -1.564 0.1196
      Height 3.496 0.6228 5.613 7.119e-08

    4. What is the average number of piercings for women at this university?

      response Sex min Q1 median Q3 max mean sd n missing
      Piercings Female 0 2 3 5 10 3.379 1.991 169 0
      Piercings Male 0 0 0 0 5 0.1719 0.7566 192 1
  3. Since we have the data for problem 10, we could get R to do all the work using t.test(), lm(), and prop.test()1. Do that.

    (Reminder: To get just the men for part c, you can use StudentSurvey %>% filter(...).)

  1. For part c, we should really perform some checks to make sure that the linear model is OK to use here. What four things are we looking for? Perform the checks and state your conclusions.

  2. Go back to some of the situations in problems 9 and 10 and create a randomization or bootstrap distribution, then

    1. Use sd() to estiamte the standard error.
    2. Compute the p-value or confidence interval.

    How do these results compare with the answer you got using formulas?


  1. prop.test() is slightly different from the method we learned. (1) It uses a Chi-squared statistic instead of a z statistic. Chi-squared is just the sqare of z when df = 1. (2) By default, prop.test() uses a “continuity correction” to improve its accuracy. If you turn that off with correct = FALSE, then the results will be more similar to you hand calculations.↩︎