Which is which?

In each example below are several plots. Identify which shows

  1. the original sample
  2. a bootstrap distribution
  3. a randomization distribution

Also, some of the labeling has been turned off for these plots. Add appropriate labeling.

CAOS item

The Comprehensive Assessment of Outcomes in Statistics (CAOS) exam is a standardized test for assessing students’ knowledge of statistical concepts. On one of the more difficult questions, the national average is 42% correct. In one class, 17 of 30 students answered the question correctly.

Gallop Poll: Smoking Ban

In 2000 40% of people surveyed favored a smoking ban in restaurants. In 2010 59% of people surveyed favored a smoking ban in restaurants. We are interested in knowing how much of a shift in public opinion this represents.

Body Temperature

df_stats(~BodyTemp, data = BodyTemp50) %>% pander()
response min Q1 median Q3 max mean sd n missing
BodyTemp 96.4 97.8 98.2 98.8 100.8 98.26 0.7653 50 0


SAT Prep

Imagine the following hypothetical study.

In an experiment to see how much an SAT Prep course helps, 2000 students are randomly assigned to two groups. One group receives the prep course, the other does not. Each group takes the SAT twice. The prep course group has their prep course between the two SAT tests. In each group, some students do better and some do worse, but in both groups most do better. The average improvement for each group is displayed in the table below

Prep Course No Course
42.7 38.5
  1. Why does this study have students take the SAT test twice?
  2. What do we call a study design like this?
  3. The 95% confidence interval for the difference in mean improvement was \((1.04, 7.36)\). Sketch what you think the bootstrap distribution looks like.
  4. Suppose we calculated a p-value from the same data. What would the null and alternative hypotheses be? Approximately what would the p-value be? Sketch what you think the randomization distribution looks like.
  5. Would you pay $3000 for this prep course? $300? $30? $3? Explain.

Hypothesis Test and Confidence Interval Inventory

We have seen how to create randomization distributions (for hypothesis test) and bootstrap distributions (for confidence intervals) in a number of different situations. Our methods work pretty generally, but we have focused our attention mainly on six situations, which we could call

For each of these situations,

  1. Recall a scenario or question we have seen that fits the label.

  2. Create a new scenario or question that fits the label.

  3. Determine what variables you would need in your data set and whether they are numerical or categorical.

  4. Write down null and alternative hypotheses in both words and symbols.

  5. Write down the code to compute your test statistic.

  6. Write down the code to create a randomization distribution.

  7. Write down the code to create a bootstrap distribution

Note: There is one situation above that we don’t know how to deal with yet. Which one? Can you figure out how to handle it?

Two Probabilities

The p-value and confidence level are both just probabilities. Choose one of your scenarios above and say carefully what each of these probabilities is in the context of your scenario. Be sure your answer addresses the “of what” question. (That is, don’t say vague things like “of the time”.)