Test 1

Date: Friday, October 2

Material Covered: Chapters 1 through 3 and Sections 5.1 and 5.3 of the IMS textbook

Technology allowed: You may use a calculator to help with arithmetic. You will not need RStudio for this test, but you may encounter R output or be asked to write down the R command you would use to do something.

If you require testing accommodations, please contact me so we can make the appropriate arrangements.

Topics

The items below are not meant to be an exhaustive list, but they comprise an important set of things you should be sure to know.

Important Terminology

You should know how to read and use these words correctly, you should be able to explain what they mean in your own words, and you should be able to give illustrative examples.

  • population, sample
  • parameter, statistic, test statistic
  • variable, case/subject/observational unit
  • categorical variable, quantitative variable
  • explanatory variable, response variable
  • observational study, experiment
  • null hypothesis, alternative hypothesis
  • test statistic, p-value, significance
  • randomization distribution, null distribution
  • predicted/fitted value, observed value, residual

Summarizing data

  • What numerical and graphical summaries are good for summarizing different kinds of data
  • Interpretting numerical and graphical summaries that you are given
  • Creating create graphical and numerical summaries in R (df_stats(), gf_histogram(), gf_bar(), gf_boxplot(), etc.) See R Examples.
  • How outliers affect various numerical summaries
  • Two-way tables: How to create them with tally(), how to use them to obtain proportions
  • Pay attention to denominators of proportions and the important word OF (mean of ____, standard deviation of ____, proportion of ____ that ____, etc.)

Study design

  • Identifying population and sample, observational units, variables
  • Deciding whether to do an observational study or an experiment
  • Avoiding bias, representative samples

Regression

  • Correlation coefficient (\(R\)) and its square (\(R^2\))
  • Principle of least squares
  • Using lm() to obtain slope and intercept of regression line
  • Using \(R\), and means and standard deviations of explanatory and response varaibles to compute the slope and intercept.
  • Using least squares regression line to make a prediction
  • Interpreting slope and intercept in context
  • Units
  • Residuals and residual plots
  • Outliers and how they might affect regression

Hypothesis Tests

  • 4-step process for conducting a hypothesis test
  • Expressing null and alternative hypotheses
  • Determining a p-value from a randomization distribution or using SE and normal distributions
  • Expressing the logic of a p-value in words (in the context of a particular example)
  • Statistical significance
  • Properties a randomization distribution must have
  • How to use physical things like coins and cards to create a randomization distribution
  • How to get RStudio to generate a randomization distribution
  • 1-tailed and 2-tailed tests (difference, when to use which)
  • Settings we have covered: 1-proportion, 2-proportions, 2-means
  • Important R functions: rflip(), do(), shuffle(), diffprop(), diffmean(), tally(), prop()

Normal Distributions

  • shape and how mean and standard deviation are related
  • 68-95-99.7 Rule
  • pnorm() and qnorm()

Format and Logistics

The test will have a variety of question types. Possible question types include

  • True/False
  • Multiple choice
  • Short answer
  • Problems where you show your work leading to a numerical result

For short answer questions, I’m looking for answers that are

  • correct,
  • clear, and
  • concise (stay focussed, don’t tell me a bunch of things that are only sort of related)

Some example questions

Your homework problems (including ones you did not need to turn in) are a good source of example questions. So are the worksheets we have been using in class.

Here are few more example problems from tests in past years.

  1. What do I do?

    In each of the following situations, pretend you want to know some information and you are designing a statistical study to find out about it. Give the following pieces of information for each: (i) what variables you would need to have in your data set, (ii) whether each variable is categorical or quantitative, (iii) the null and alternative hypotheses for a hypothesis test related to the question, (iv) whether the study is an observational study or an experiment.

    1. You want to know if boys or girls score better on reading tests in Kent County grade schools.

    Notes:

    • See this worksheet for some more scenarios.
    • Other things could be asked (like what plot to use to investigate the data, etc.)
    • For some of the of scenarios above, you should be able to create a randomization distribution and use it to get a p-value.