Lock5withR includes a data set with data from a student survey. It includes the following variables.
Year Year in schoolGender Student’s gender: ‘F’ or ‘M’Smoke Smoker? ‘No’ or ‘Yes’Award Prefered award: ‘Academy’ ‘Nobel’ ‘Olympic’HigherSAT Which SAT is higher? ‘Math’ or ‘Verbal’Exercise Hours of exercsie per weekTV Hours of TV viewing per weekHeight Height (in inches)Weight Weight (in pounds)Siblings Number of siblingsBirthOrder Birth order, 1=oldestVerbalSAT Verbal SAT scoreMathSAT Math SAT scoreSAT Combined Verbal + Math SATGPA College grade point averagePulse Pulse rate (beats per minute)Piercings Number of body piercingsFor now, let’s focus on just two variables: Sex and Award. (Each student was asked whether they would rather win a Academy Award, a Nobel Prize, or an Olympic Gold medal. Award records their answers.) If the members of your group were added to the data set (just for these two variables), what would the new rows of data look like?
Write down some questions we might answer using the Sex and/or Award variables. Which of your questions/answers need both variables? Which only require one of the variables?
Our main tools for investigating question like this will be tally() for numerical summaries and gf_bar() for bar plots.
Run these commands to find out.
library(Lock5withR) # Load the package that contains the data
gf_bar( ~ Award, data = StudentSurvey)
tally( ~ Award, data = StudentSurvey)Run the commands below to make numerical tables of different kinds.
tally( ~ Award | Sex, data = StudentSurvey, format = "percent")
tally( Award ~ Sex, data = StudentSurvey, format = "prop")
tally( Award ~ Sex, data = StudentSurvey, margins = TRUE)
tally( Award ~ Sex, data = StudentSurvey, margins = TRUE, format = "percent")Which tables do you like best for this question?
When you use proportions or percents, be sure to check which things add up to 1 or 100%. (Possible answers: rows, columns, or the whole table.)
gf_bar() can create a variety of bar charts.
Try these examples.
gf_bar( ~ Award, data = StudentSurvey, fill = ~Sex)
gf_bar( ~ Award, data = StudentSurvey, fill = ~Sex, position = "dodge")
gf_bar( ~ Award | Sex, data = StudentSurvey, fill = ~Sex)
gf_bar( ~ Sex, data = StudentSurvey, fill = ~ Award)Which do you like best for answering this question?
We can also use gf_props() or gf_percents() to make bar charts on a proportion or percent scale.
Try these (our use gf_percents() instead of gf_props() if you want percents instead of proportions):
gf_props( ~ Award, data = StudentSurvey, fill = ~Sex)
gf_props( ~ Award, data = StudentSurvey, fill = ~Sex, position = "dodge")
gf_props( ~ Award, data = StudentSurvey, fill = ~Sex, position = "dodge",
denom = ~fill)
gf_props( ~ Award | Sex, data = StudentSurvey, fill = ~Sex)
gf_props( ~ Sex, data = StudentSurvey, fill = ~ Award)
gf_props( ~ Sex, data = StudentSurvey, fill = ~ Award, denom = ~x)In each case, determine which segments add to 1 (or 100 percent).
What does denom do?
A nationwide US telephone survey conucted by the Pew Foundation in October 2010 asked 2625 adults ages 18 and older “Some people say there is only one true love for each person. Do you agree or disagree?” The survey participants were selected randomly by landlines and cell phones. In addition to the answer to the question, surveyors recorded the sex of each person surveyed.
What is the population for this study?
What are some potential sources of bias in this study? Do you expect the bias to be relatively small or potentially large?
What are the cases in this study?
What are the variables? Are they categorical or quantitative?
Write down what the first few rows of the data set would look like if your group members were the first few cases.
Of those surveyed, 735 people agreed, 1812 disagreed, and 78 answered “don’t know”.
It is important to distinguish between the proportion of people in the population who would answer a certain way and the proportion of people in our sample who did answer a certain way. We have terminology and notation to distinguish between the two.
| summary | parameter | statistic |
|---|---|---|
| proportion | \(p\) | \(\hat p\) (read: p hat) |
| mean | \(\mu\) (Greek letter “mu”) | \(\overline x\) (read: x bar) or \(\hat \mu\) |
| standard deviation | \(\sigma\) (Greek letter “sigma”) | \(s\) or \(\hat\sigma\) |
The notation for median and iqr is less standardized.
Here is the two way table for the Pew study.
| answer | Male | Female |
|---|---|---|
| agree | 372 | 363 |
| disagree | 807 | 1005 |
| don’t know | 34 | 44 |
Use the table to answer the following questions