[RStudio@Calvin] [Dope Sheets] [From Class] [Calendar] [Test Info] [IMS text] [IMS errata] [Homework]
Dope Sheets
dope
- n. information, especially from a reliable source [the inside dope];
- v. figure out – usually used with out;
- adj. excellent
Here’s where you can come to find out what’s up each day. Homework assignments are available on the homework page.
Week 1
Tue, Sept 1
Meet outdoors between Spoelhof and the road, near the Jonah/fish/cheese sculpture.
Topic of the Day: Introduction to Stat 145
Before tomorrow’s class
-
Read Chapter 1 of IMS. You don’t have to pick up every detail on this reading. Instead focus on the following key terms.
-
case (also called observational unit) vs. variable
-
numerical (also called quantitative) variable vs. categorical variable
-
population vs. sample
-
experiment vs. observational study
-
explanatory variable vs response variable
-
parameter vs. statistic
-
bias
There is less terminology in statistics than in biology, but it must be used more carefully, so it is important that we understand these terms. Notice that most of these come in pairs. These pairs make some important distinctions. (Example of a distinction from biology: male vs. female makes an important distinction in many types of organisms.)
Note: Most of these terms can also be found in Sections 1.1-1.3 of of ISLBS. Feel free to look there for a (sligtly) different explanation and different examples.
-
Do PS 0
This assignment is a little bit unusual. I don’t usually have things due before we discuss them, but I want to use this assignment to guide our discussion tomorrow. It will be graded for completion and effort, not for correctness. It will also give you a chance to try out Gradescope in a low stakes setting.
Wed, Sept 2
Before class: complete PS 0. Submit it via Gradescope, and also bring it with you.
Meet outdoors between Spoelhof and the road, near the Jonah/fish/cheese sculpture.
Handout: Data and Studies
Fri, Sept 4
Meet online (Teams) – video
Test Blackout Dates
- We’ve decided not to have tests on advising days, so now we need to pick the dates for our tests. You can help me avoid bad days by filling out this form where you can list dates that don’t work well for you. I can’t promise to avoid every person’s blackout dates, but I’ll try to avoid dates that seem to be a problem for several people in the class.
Groups for the day (first person start the meeting, others join)
- Group 1: Saul Miranda Valencia, Marian Henderson, Alyssa Dekker
- Group 2: Nathan Haverstick, Abigail Liebetreu, Lucas Walker
- Group 3: Bryce Reynolds, Abigail Strong, Claire Stannis
- Group 4: Brielle De Nooyer, Brant Van Noord, Christian Swaim
- Group 5: Alex Van Uffelen, Lyric Johnson, Elizabeth Griffen
- Group 6: Jared Van Noord, Elijah Faith, Clinton Jackson
- Group 7: Ana Li Warners, Robin Kollar, Cameron Massy
- Group 8: Samuel Ydenberg, Michael Akpabey, Brandon Turcotte
- Group 9: Hannah Brown, Sara Koenig, Eleanor Scheeres
- Group 10: Hayden Janssen, Jeffrey Arthur, Emma Thompson
Topic of the day: Exploring data with plots
- Google Presentation for your assignment.
- Details are in part C of the tutorial below.
- Plotting Tutorial (hosted online)
- Part A (Births) - We went through this part in class.
- Part B (NHANES) – This will give you a chance to try things with a different data set and lots of specific direction.
- Part C (more practice) – This will let you explore on your own and introduce a few more “extras”.
- Alternative (only use if hosted tutorial isn’t working for some reason): Run in RStudio console. Copy and paste one of the commands below – the one for the part you want to do.
learnr::run_tutorial('PlottingBasics2020A',
package = 'StatTutor')
learnr::run_tutorial('PlottingBasics2020B',
package = 'StatTutor')
learnr::run_tutorial('PlottingBasics2020C',
package = 'StatTutor')
Week 2
Groups
Groups for this week (first person start the meeting, others join)
- Group 1: Saul Miranda Valencia, Marian Henderson, Alyssa Dekker
- Group 2: Nathan Haverstick, Abigail Liebetreu, Lucas Walker
- Group 3: Bryce Reynolds, Abigail Strong, Claire Stannis
- Group 4: Brielle De Nooyer, Brant Van Noord, Christian Swaim
- Group 5: Alex Van Uffelen, Lyric Johnson, Elizabeth Griffen
- Group 6: Jared Van Noord, Elijah Faith, Clinton Jackson
- Group 7: Ana Li Warners, Robin Kollar, Cameron Massy
- Group 8: Samuel Ydenberg, Michael Akpabey, Brandon Turcotte
- Group 9: Hannah Brown, Sara Koenig, Eleanor Scheeres
- Group 10: Hayden Janssen, Jeffrey Arthur, Emma Thompson
Mon, Sept 7
- video recording
- Announcements
PS 01 (gradescope) and PS 02 (google presentation) are due tonight.
The problems in IMS Chapter 2 were re-orderd this morning. I’ve adjusted my homework sheet to match, but let me know if you spot any errors on my part.
Submit your test blackout dates soon. Fill out the form form multiple times to submit multiple dates.
Contribute to the text by submitting any errors you find. I’ll give a few bonus homework points to students whose contributions are frequent and high quality. Check the previous reports before submitting to avoid submitting duplicates.
- Topics of the Day
Wed, Sept 9
- Announcements
- PS 3A (RMarkdown) and PS 3B (IMS 2.1) are due tomorrow at 11:59, even thought we don’t have class on Thursdays.
- Topic of the Day: Exploring Categorical Data
- Worksheets
- Summarizing Exercise (from last time): [HTML] [PDF]
- Summarizing Categorical Data (new): [HTML] [PDF]
- computing proportions – keep your eye on the denominator!
- bar charts – dodge or stack?
- pie charts – should we ever use them?
Week 3
Groups for the Week
- Group 1: Jared Van Noord, Alyssa Dekker, Alex Van Uffelen, Emma Thompson
- Group 2: Hyeon Kim, Clinton Jackson, Christian Swaim, Marian Henderson
- Group 3: Eleanor Scheeres, Elijah Faith, Bryce Reynolds, Cameron Massy
- Group 4: Jeffrey Arthur, Nathan Haverstick, Brandon Turcotte, Claire Stannis
- Group 5: Abigail Strong, Abigail Liebetreu, Samuel Ydenberg, Michael Akpabey
- Group 6: Sara Koenig, Brielle De Nooyer, Saul Miranda Valencia, Lyric Johnson
- Group 7: Robin Kollar, Ana Li Warners, Hayden Janssen, Brant Van Noord
- Group 8: Elizabeth Griffen, Hannah Brown, Lucas Walker
Mon, Sep 14
- Announcements
- PS 4 due today at 11:59 pm
- New Resource: R Examples
- Watch for news about tomorrow’s class
- Topic of the Day: Malaria Vaccine Case Study (IMS 2.4)
- Malaria Vaccine Case Study Worksheet: [HTML] [PDF]
Tue, Sep 15
- Announcements
Meet in CFAC Tent today
Combining PDFs for gradescope: If you need to combine multiple PDFS into a single document to submit to gradescope, you can find tools for this online. (Note: If you submit mutliple times in gradescope, each submission replaces the previous submission. This lets you submit a revised version if you find an error, or complete more of the assignment. But if you have your assignment in mutltiple parts – perhaps some done by hand and some done with R – you need to combine them before submitting.)
Here’s one PDF combiner that I found:
Let me know if you find something better and I’ll add it to the list.
- Topics of the Day
- Evaluating the evidence in the Malaria Vaccine Case Study
- Understanding the logic of this simulation is very important
- We will learn how to get R to do this much more quickly tomorrow.
- Fitting lines to data: [HTML] [PDF]
- Key idea: How does SSE measure how well a line fits a data set?
Wed, Sep 16
- Announcements
- If you tried to submit mulitple documents for PS 4, please combine them into a single PDF and resubmit. (Gradescope replaces previous submissions each time you submit.)
- Next PS due on Thursday
- Test 1 on Friday, Oct 2
- Topics of the Day
- Using simulations to evaluate evidence
- Least Squares Regression Lines
- Using SSE to find the least squares line
lm()
– R’s function for finding this line
- Residuals and residual plots
Fri, Sep 18
- Announcements
- Topics of the Day
- Testing for a difference in proportions – [HTML] [PDF]
Week 4
Mon, Sep 21
Announcements
- Test 1 next Friday
- PS due tonight at 11:59pm
- PS 1 and PS 3A have been graded. More grading coming soon.
- Be sure to correctly indicate which problems are on which pages
Topics of the Day:
- Follow-up on Testing for a difference in proportions – [HTML] [PDF]
- generalizability
- does it matter that there are more right-handers? [case-control study designs]
- expressing statistical hypotheses with mathematical notation
- drawing conclusions by comparing test statistic to null distribution
Tue, Sep 22
- PS 7 (due Thursday night) has been posted.
- Topics of the Day:
- Two facts about regression line
- computing regression line from \(R\), \(\overline{x}\), \(\overline{y}\), \(s_x\), and \(s_y\)
- Outliers and Regression
- Regression: Predictions and Residuals – [HTML] [PDF]
Fri, Sep 25
- Announcements
- PS 8 will be a bit different
- Due next Tuesday at 11:59pm
- PS 8 will be done in Moodle (so it can be auto-graded in time for your test preparation).
- The assignment should be posted later today and will include some material that we won’t cover until Monday.
- No assignment due next Thursday (you will be preparing for your test).
- I expect you will see graded homework trickling in to Gradescope over the next few days. The grader is trying hard to get it all graded before your test.
- Test next Friday
- Study guide has been posted
- Details regarding logistics next week
- Next Tuesday and Wednesday will be mainly review and practice. I’ve scheduled the CFAC tent again, and we will meet there unless the weather doesn’t cooperate.
- Topic of the Day: Null Distributions and Normal Distributions
Week 5
Groups for the Week
- Group 1: Lucas Walker, Samuel Ydenberg, Abigail Liebetreu, Jeffrey Arthur
- Group 2: Brant Van Noord, Brielle De Nooyer, Ana Li Warners, Claire Stannis
- Group 3: Hayden Janssen, Hyeon Kim, Bryce Reynolds, Nathan Haverstick
- Group 4: Cameron Massy, Brandon Turcotte, Eleanor Scheeres, Lyric Johnson
- Group 5: Jared Van Noord, Christian Swaim, Marian Henderson, Michael Akpabey
- Group 6: Elizabeth Griffen, Clinton Jackson, Emma Thompson, Elijah Faith
- Group 7: Robin Kollar, Saul Miranda Valencia, Alyssa Dekker, Abigail Strong
- Group 8: Sara Koenig, Alex Van Uffelen, Hannah Brown
Mon, 9/28
- Announcements
- Test on Friday will use Moodle/Respondus/Gradescope combo
- I’ll create a practice item so you can see how the system works
- Meet in CFAC Tent tomorrow – bring your laptop if you have one
- Topics of the Day
Wed, 9/30
- Meet in CFAC Tent
- Some Reminders:
- Use today to practice for the test and get questions answered. Possible ways to do that:
Week 6
Mon, 10/5
- New groups:
- Group 1: Cameron Massy, Jeffrey Arthur, Abigail Liebetreu, Saul Miranda Valencia
- Group 2: Michael Akpabey, Christian Swaim, Eleanor Scheeres, Ana Li Warners
- Group 3: Elijah Faith, Lyric Johnson, Bryce Reynolds, Nathan Haverstick
- Group 4: Brandon Turcotte, Robin Kollar, Sara Koenig, Alex Van Uffelen
- Group 5: Hyeon Kim, Marian Henderson, Abigail Strong, Hayden Janssen
- Group 6: Samuel Ydenberg, Emma Thompson, Alyssa Dekker, Hannah Brown
- Group 7: Clinton Jackson, Brant Van Noord, Jared Van Noord, Elizabeth Griffen
- Group 8: Claire Stannis, Lucas Walker, Brielle De Nooyer
- Announcements:
- Test 1 has been graded.
- Meet in the tent tomorrow (pending reservation confirmation)
- Next problem set due Thursday night (will be posted later today once tent reservation is confirmed)
- Topic of the Day: Probability
- What is probability?
- Some probability rules
- Note: IMS does not have a probability section but ISLBS does. You can download ISLBS for free as a PDF.
Wed, 10/7
- Topic of the Day: Probability and Biology
Worksheet: [HTML] [PDF]
Focus your attention on the first 3 problems
- Bob’s boxes
- Breast cancer screening
- Disease testing
If you finish those three sections, work on some of the remaining problems which reinforce probability tools we have already learned but show ways they come up in biological settings.
There are some comments/hints/solutions to these below. Take a look at them after each section before moving on to the next.
Comments/Hints/Solutions
Box A has 9 balls; Box B has 7 balls. The boxes are equally likely to be selected, but since box A has more balls than box B, those balls are less likely to be chosen.
If it helps, imagine an extreme case: Suppose Box A had 1 million balls and Box B had 1. The balls in Box A would be very unlikely to be selected – even if Box A were chosen. The ball in Box B would be chosen half the time – every time Box B is chosen.
Note 1: The color of the balls is not the issue here. We are asking about the individual balls, not their color.
What’s the point? We cannot use the equally likely rule by counting balls since the balls are not equally likely.
We can get some more probabilities if we use the product rule: \(\operatorname{P}(E \operatorname{and}F) = \operatorname{P}(E) \cdot \operatorname{P}(F \mid E)\)
Now we can get even more probabilities using \(\operatorname{P}(E \operatorname{or}F) = \operatorname{P}(E) + \operatorname{P}(F) - \operatorname{P}(E\operatorname{and}F)\). Do you see how? (Note: this equation involves 4 probabilities. If you ever know 3 of them, you can solve for the fourth.)
We want \(\displaystyle \operatorname{P}(A \mid R) = \frac{\operatorname{P}(A \operatorname{and}R)}{\operatorname{P}(R)}\). The top part of that fraction is in our inventory. But how do we get the bottom part?
A red ball could have come form either box, so let’s use \(R = (A \operatorname{and}R) \operatorname{or}(B \operatorname{and}R)\) and our rule for unions.
\(\operatorname{P}(R) = \operatorname{P}((A \operatorname{and}R) \operatorname{or}(B \operatorname{and}R)) =\) ________ + __________ - _________
Combining everything we get
\(\operatorname{P}(A \mid R) = \displaystyle \frac{\operatorname{P}(A \operatorname{and}R)}{\operatorname{P}(R)}\) \(= \displaystyle \frac{\operatorname{P}(A) \cdot \operatorname{P}(R \mid A)}{\operatorname{P}(A) \cdot \operatorname{P}(R \mid A) + \operatorname{P}(B) \cdot \operatorname{P}(R \mid B)}\) \(= \displaystyle \frac{\frac12 \cdot \frac79}{\frac12 \cdot \frac79 + \frac12 \cdot \frac37}\) \(=\) 0.6447
Notice how the question has the color in the condition and the solution has the box in the condition. This is our first example of a problem that flips those around. The general approach is similar in all of these problems and is usually credited to Rev. Thomas Bayes. (The formal theorem involved is called Bayes’ Theorem, and there is an entire branch of statistics called Bayesian statistics based on this idea. You can learn about that in Stats 341.)
The approach used above is very algebraic. We can organize the same arithmetic visually using the method of probability trees.
First we set up the tree in this video.
Note: This video uses \(\cap\) to mean “and”. (That’s a common mathematical notation.)
Now that we have our tree, we can use the tree to get the probability we want in this video.