Content
Here is a list of things you should be sure you know how to do. It is not intended to be an exhaustive list, but it is an important list.
You should be able to:
Understand, use and explain the statistical terminology.
- Be sure to focus on important distinctions being made by terms like case vs. variable, categorical vs. quantitative, explanatory variable vs. response variable, statistic vs. parameter, sample vs. population, sample vs. sampling distribution, sampling distribution vs. bootstrap distribution, etc.
- Some other important terms: significance level, confidence level, margin of error, statistically significant, type I error, type II error, critical value, paired design, blinding, residual, correlation
Understand the issues involved in collecting good data and the design of studies, including the distinctions between observational studies and experiments.
Understand how confidence intervals are computed
- how to get R to generate a bootstrap distribution.
- using a bootstrap distribution to compute a confidence interval
- using standard error formulas to compute a confidence interval
- using summary information from a linear regression model
- how to determine good sample sizes for a desired margin of error.
Understand what a confidence interval tells you
- meaning of confidence level
- recognizing incorrect ways to interpret a confidence interval and what is wrong with them.
- relationship between p-values and confidence intervals
Use the 4-step process for conducting a hypothesis test, including
- expressing null and alternative hypotheses
- computing an appropriate test statistic
- how to get R to generate a randomization distribution
- determining a p-value from a randomization distribution
- determining a p-value from a using formulas (SE, Chi-squared, degrees of freedom)
- expressing the logic of a p-value in words (in the context of a particular example).
- the difference between 1-sided and 2-sided tests
- why we use upper tails for Chi-squared tests.
Perform and interpret Chi-squared tests a Chi-squared goodness of fit vs. Chi-squared for two-way tables
- How to compute expected counts
- Chi-squared test statistic
- degrees of freedom
Perform and interpret 1-way ANOVA
- null and alternative hypotheses
- using
lm()
to fit the model - computation of \(F\) statistic (ANOVA table, degrees of freedom, SS, MS, etc.)
- \(R^2 = \frac{SSM}{SST}\) and what it tells us
- Tukey’s Honest Significant Differences (
TukeyHSD()
) and why we use it. - checking assumptions (normality, equal standard deviation)
- residuals and residuals plots
Perform and interpret simple linear regression
- linear relationships and equations for lines (slope, intercept, etc.)
- hat notation (\(\hat y\), \(\hat \beta_1\), etc.)
- using
lm()
to fit the model - computation of \(F\) statistic (ANOVA table, degrees of freedom, SS, MS, etc.)
- \(R^2 = \frac{SSM}{SST}\) and what it tells us
- correlation coefficient (\(R\))
- checking assumptions (LINE)
- residuals and residuals plots
Check conditions/rules of thumb to see whether approximations (normal, t, Chi-squared) are good enough for our purposes.
Important functions to review include
gf_histogram()
,gf_boxplot()
,gf_point()
df_stats()
,tally()
,mean()
,prop()
,diffmean()
,diffprop()
pnorm()
,pt()
,pchisq()
,qnorm()
,qt()
,qchisq()
rbind()
do()
,resample()
,shuffle()
chisq.test
,xchisq.test()
,t.test()
,prop.test()
lm()
,msummary()
mplot()
,anova()
Note that the test will be a sample from the possible topics; it is not possible to cover everything on the test.
The following formulas will be included on the test:
parameter type | one group | two groups |
---|---|---|
proportion | \(\displaystyle SE = \sqrt{\frac{p (1-p)}{n}}\) | \(\displaystyle SE = \sqrt{ \frac{p_1 (1-p_1)}{n_1} + \frac{p_2 (1-p_2)}{n_2}}\) |
mean | \(\displaystyle SE = \frac{\sigma}{\sqrt{n}}\) | \(\displaystyle SE = \sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}\) |
You will need to know how to adjust these for use with confidence intervals and p-values and how to determine the degrees of freedom for t-distributions.