Logistics

Date: Tuesday, October 27

Material Covered:

The test is cumulative, but it will emphasize the more recent topics (probability, confidence intervals, and hypothesis tests).

Technology allowed: You will be allowed to use RStudio via your laptop. No other software may be running, and you may not browse to other sites. (Close everything else down prior to the test.)

Accommodations: If you require testing accommodations, please contact me so we can make the appropriate arrangements.

Topics

Here is a list of things you should be sure you know how to do. It is not intended to be an exhaustive list, but it is an important list. You should be able to:

Some sample problems

  1. What do I do?

    In each of the following situations, pretend you want to know some information and you are designing a statistical study to find out about it. Give the following pieces of information for each: (i) what variables you would need to have in your data set, (ii) whether each variable is categorical or quantitative, (iii) the null and alternative hypotheses or the parameter for which you would create a confidence interval, (iv) whether the study is an observational study or an experiment, (v) particular design elements you would use (randomization (of what? how?), blinding, matched pairs design, etc.).

    1. You want to know if boys or girls score better on reading tests in Kent County grade schools.

  2. For the same sorts of scenarios as above, you should be able to create a bootstrap or randomization distribution and use it to get a confidence interval or a p-value.

  1. Be sure to show some work as you answer the following questions.

    A certain test is standardized in such a way that the mean score is 40 and the standard deviation is 5.

    1. What \(Z\)-score is associated with a test score of 48.5?
    2. Approximately what percentage of people score above 48.5 on the test?
    3. Approximately what percentage of people score between 37.0 and 48.5 on the test?
    4. Fred scored in the 65th percentile. What was his test score? What percent of the test takers did better than Fred?
  2. A certain IQ test is standardized in such a way that the mean score is 100 and the standard deviation is 10. For parts b), c) and d) please sketch the distribution and shade the appropriate region.

    1. What \(Z\)-score is associated with a score of 117?
    2. Aproximately what percentage of people have IQs between 90 and 110?
    3. Approximately what percentage of people have IQs above 120?
    4. Approximately what percentage of people have IQs between 90 and 120?

Solutions

  1. Randomization
Randomization <-
  do (1000) * diffmean(reading_score ~ shuffle(sex), data = ReadingStudy)
Bootstrap <-
  do (1000) * diffmean(reading_score ~ sex, data = resample(ReadingStudy))
  1. Standardized test
# z-score
(48.5 - 40) / 5
## [1] 1.7
# score above 48.5
1 - pnorm(48.5, mean = 40, sd = 5)
## [1] 0.0446
1 - pnorm(1.7)
## [1] 0.0446
xpnorm(48.5, mean = 40, sd = 5)
## 
## If X ~ N(40, 5), then
##  P(X <= 48.5) = P(Z <= 1.7) = 0.9554
##  P(X >  48.5) = P(Z >  1.7) = 0.04457
## 

## [1] 0.955
# between 37 and 48.5 
pnorm(48.5, mean = 40, sd = 5) - pnorm(48.5, mean = 40, sd = 5)
## [1] 0
# 65th percentile
qnorm(.65, mean = 40, sd = 5)
## [1] 41.9
xqnorm(.65, mean = 40, sd = 5)
## 
## If X ~ N(40, 5), then
##  P(X <= 41.9) = 0.65
##  P(X >  41.9) = 0.35
## 

## [1] 41.9
  1. IQ test

These can all be approximated without using R.

  1. \(z = 17 / 10 = 1.7\)
  2. That’s the middle 68%
  3. That’s 2 standard deviations out so 2.5% (95% in the middle, 2.5% in each tail)
  4. 1/2 of 68% + 1/2 of 95% = 81.5%

Or we can use R.

# z-score
(117 - 100) / 10
## [1] 1.7
# between 90 and 100 -- should be about 68%
pnorm(110, mean = 100, sd = 10) - pnorm(90, mean = 100, sd = 10)
## [1] 0.683
pnorm(1) - pnorm(-1)
## [1] 0.683
# above 120 -- should be about 2.5%
1 - pnorm(120, mean = 100, sd = 10)
## [1] 0.0228
1 - pnorm(2)
## [1] 0.0228
1 - xpnorm(120, mean = 100, sd = 10)
## 
## If X ~ N(100, 10), then
##  P(X <= 120) = P(Z <= 2) = 0.9772
##  P(X >  120) = P(Z >  2) = 0.02275
## 

## [1] 0.0228
# between 90 and 120
pnorm(120, mean = 100, sd = 10) - pnorm(90, mean = 100, sd = 10)
## [1] 0.819
xpnorm(c(90, 120), mean = 100, sd = 10)
## 
## If X ~ N(100, 10), then
##  P(X <=  90) = P(Z <= -1) = 0.1587   P(X <= 120) = P(Z <=  2) = 0.9772
##  P(X >   90) = P(Z >  -1) = 0.84134  P(X >  120) = P(Z >   2) = 0.02275
## 

## [1] 0.159 0.977