class: center, middle, inverse, title-slide # Using
Primary and Secondary Literature
in
Statistics and Data Science Classes ### Randall Pruim ### Stacy DeRuiter ### USCOTS 2021 --- # Before we get started * You are in the right place if you are looking for .large[.center[ *Using Primary and Secondary Literature<br>in Statistics and Data Science Classes* ]] * Slides are available at .large[.center[<https://bit.ly/uscots21-stat-lit>]] * We will be using Poll Everywhere for a few polls .large[.center[<https://pollev.com/stacyderuite335>]] * Feel free to post questions/comments in the chat as we go along. --- # Who we are <table> <tr> <td> <img src = "images/rpruim-square500.jpg" height = 200> <br> Randall Pruim<br> <a href="mailto:rpruim@calvin.edu">rpruim@calvin.edu</a> </td> <td> <img src = "images/sld.jpg" height = 200><br> Stacy DeRuiter<br> <a href="mailto:sld33@calvin.edu">sld33@calvin.edu</a> </td> </tr> </table> **Calvin University** * Primarily private undergraduate institution with ~ 3200 students. * Over half of the students take at least one statistics course. ??? * Planning to introduce online masters in data science in 2022. --- # Who we are not * We are not experts * We have tried some things — some have worked better than others -- * We are not the only ones doing this * We are hoping some of you can share your experience as well --- # Who else is here? --- <!-- where are you? --> <iframe src="https://embed.polleverywhere.com/clickable_images/GBVNwunchXoyuoV12PMln?controls=none&short_poll=true" width="960px" height="540px"></iframe> --- <!-- what institution are you from? --> <iframe src="https://embed.polleverywhere.com/multiple_choice_polls/PKHFXiuD6XzKJ4W9QYtw4?controls=none&short_poll=true" width="960px" height="540px"></iframe> --- <!-- how many articles? --> <iframe src="https://embed.polleverywhere.com/multiple_choice_polls/pN2R4iqTULCg1Nq0ciDej?controls=none&short_poll=true" width="960px" height="540px"></iframe> --- <!-- how many articles, non-intro? --> <iframe src="https://embed.polleverywhere.com/multiple_choice_polls/jRyPF7rGJ2adrYb0weL2a?controls=none&short_poll=true" width="960px" height="540px"></iframe> --- <!-- which fields give good articles? <iframe src="https://embed.polleverywhere.com/free_text_polls/U2XDV01onRe6xHMgCEl39?controls=none&short_poll=true" width="960px" height="540px"></iframe> --> <!-- barriers to using more articles --> <iframe src="https://embed.polleverywhere.com/discourses/FyrthYvXSPawUF47CctzE?controls=none&short_poll=true" width="960px" height="540px"></iframe> --- # Outline **1. [Motivation](#motivation)** — Why get students reading statistical literature? **2. [Some examples](#examples-list)** from our classes **3. [Finding resources and creating activities](#good-resources)** **4. [Working together](#repo)** — A simple repository (contributions welcome!) **5. Discussion** --- name: motivation ## Why use primary and secondary sources? * Topical (e.g., COVID, elections, science news, etc.) -- * Interesting -- * Invites students into the statistics community -- * Models good practices (if selected well) -- * **Most of our students will read more statistics than they produce**. ??? Possible polls: * barriers to using more articles * wish used more articles that actually use --- # Teaching students to read (statistics) Students need some guidance when they start reading statistics so they can * know what to look for -- * know what to ignore * many articles may contain a mix of things they do and do not know -- * see the connections between what they know and how it appears in reports * reports are generally much terser than text books * students may need to help to fill in some details --- ## Reading Statistics: 7 Critical Components Modified from page 16 of ***Seeing Through Statistics***, 2e by Jessica Utts. 1. Source of **funding** (Why was the research done? Who paid for it?) 2. **Researcher contact** (How did researchers interact with subjects?) 3. **Individuals studied** and how they were selected 4. **Measurements** made (questions asked) 5. **Setting** in which the data were collected 6. **Extraneous differences** and other explanations 7. **Magnitude of** claimed **effect** See also <https://rpruim.github.io/s145/articles/Reading-Statistics.html> ??? * also 7 things to include if you are **writing** * The level of detail will vary depending on the audience of the report. Much of this information might be missing in a news report, but it should be present in a good scientific paper and in the better news reports. url: for an example of how this can be presented to students (with some additional elaboration for each item). --- name: examples-list # Some Examples Organized by how they might get used in a course * [Reviewing via literature](#case-review) * [One-off](#case-one-off) * [Assessment](#case-assessment) * [Biostat Discussion Serires](#case-biostat-discussion-series) * [Web Module](#case-web-module) * [Assignment Cycle](#case-assignment-cycle) --- name: case-review # Example: Review via literature source: <https://rpruim.github.io/s145/articles/Reading-Statistics.html> Includes several articles * Yearbook smiling * **Malaria** * Alzheimer's * Gratitude * Sea birds and an Oil Spill * **COVID-19 vaccines** Also some pre-amble * 7 critical components (with some additional detail) * Figuring out statistical methods ??? Switch slide share after this slide --- name: case-one-off # Example: One-off *Objective: identify study type, design using realistic examples* <p class="aligncenter"> <img src="https://www.mlive.com/resizer/IeOh0bTaifrnHCK7F1XyA9KsTe8=/1280x0/smart/image.mlive.com/home/mlive-media/width600/img/sports_impact/photo/gabriel-richard-football-practice-7bfedb097016cd3a.jpg" height="300"> <img src="https://media.npr.org/assets/img/2018/09/13/gettyimages-181825245-cfc961d320071035da7c57de288df1c48b8b48b3-s1100-c15.webp" height="300"></p> --- # Example: One-off Instructions: <https://stacyderuiter.github.io/s243-2021/study-design-case-studies.html> 1. What do you think is the *population* of interest for this study? (Remember, the population can usually be described as, "*All* of the ____") 2. How was the sample chosen? (Was it a simple random sample, a stratified sample, a convenience sample, ... ?) 3. Do you think it is reasonable to draw conclusions about the population of interest with this sample? State any reasons why you think the sample might not be representative (you can also think of these as sources of *sampling bias*.) 4. What kind of study was carried out: Experimental or Observational? ... ... <!-- If experimental, how were treatments assigned: was it randomized? Was there blocking? If observational, was it prospective (subjects were chosen for the study and then tracked over time) or retrospective (data collected after the fact)? --> <!-- 6. Does this study allow conclusions about *causation* to be drawn, or only *association* between explanatory and response variables? --> <!-- 7. What is the main point conveyed by the news article you read? Considering your answers above, do you have any reason to question the conclusion, or suggest a different headline? --> [↩](#examples-list) --- name: case-assessment # Example: Assessment In 2015, Beis and colleagues published a paper titled, *Brain serotonin deficiency leads to social communication deficits in mice.* They say, "A deficit in brain serotonin is thought to be associated with deteriorated stress coping behaviour, affective disorders and exaggerated violence. We challenged this hypothesis in mice with a brain-specific serotonin depletion." Their study included mice with three genotypes: one with two normal copies of a gene, *Tph2*, related to brain serotonin levels; one with one copy of the gene; and one with none (resulting in <5% of normal brain serotonin levels). They observed the mice in controlled experimental social situations, and measured variables including the number of social contacts the mice had (`n_contacts`) and the total time (in sec.) the mice spent socializing (`social_time`). They also recorded the mice's `sex` (male or female) and genotype (`serotonin_genes`). --- # Example: Assessment You can read [the data](https://sldr.netlify.app/data/mice.csv) into R with: ```r mice <- read.csv('https://sldr.netlify.app/data/mice.csv') ``` --- # Example: Assessment **A) (6 points)** The authors present the data on **`topic[1]`** as a function of genotype in the figure below. The error bars indicate +/- 1 standard error. Consider their work, then use methods learned in class to create your own graph that is a good or better alternative display of the same variables. Accompany your figure with a brief explanation of the differences between your figure and the published one, and advantages of your graph. ![](images/mouse-serotonin.png) --- # Example: Assessment <!-- Normally I wouldn't put a text block like this on screen but I want it as the "image" to talk about - LMK if you think this is a really bad idea. I won't be reading or detailing it all - just illustrate how versatile it can be / once you introduce a scenario you can ask ALL the q with it. --> **B) (4 points)** Describe the distribution of `var` for genotype `gtype`. (Try to include enough detail so that someone reading your answer could make a decent sketch of it.) <!-- For reference, you may make a new figure or use the one you made for part A. --> **C) (16 points)** Choose a PDF whose shape and support would match that of the distribution you just described, and justify your choice (4 points). Fit this PDF to the data by a method of your choice, showing all your work and clearly reporting the resulting parameter estimates (8 points). Include a plot of your fitted distribution overlaid on a histogram of the data, and comment on the goodness of fit (4 points). **D) (22 points)** Carry out a hypothesis test to answer: **`question[1]`** Show all details of the test: state your hypotheses (4 points); report your sample statistic (2 points); check any required conditions (6 points); show all work done to complete the test (6 points); state the p-value and your conclusion in context (4 points). --- # Example: Assessment (Multimedia) <https://stacyderuiter.github.io/s243-2021/vents.html> <img src="images/embed-npr.png" width="85%" /> [↩](#examples-list) --- name: case-biostat-discussion-series ## Example: Biostat Discussion Series ** Guided Reading - Small Groups - Report Out ** <!-- Note: may expand to one slide per bullet with an image to accompany --> - **Graphics critique; display & describe distributions.** Muth et al. 2017, [Bees tasting pollen](http://rsbl.royalsocietypublishing.org/content/roybiolett/12/7/20160356.full.pdf). - **Graphics, study design, randomization tests** Mönkkönen et al. 2009, [Bird nest predation](http://rsbl.royalsocietypublishing.org/content/roybiolett/5/2/176.full.pdf) - **Graphics, inference for 2 proportions.** Thombs et al. 2015, [Self-citation during peer review](https://www.sciencedirect.com/science/article/abs/pii/S0022399914003468) - **Study design, inference using randomization & t-stats.** Barros et al. 2014, Seabirds and oil spill: [primary](https://royalsocietypublishing.org/doi/10.1098/rsbl.2013.1041) and [secondary](http://www.nature.com/news/bird-reproduction-collapsed-after-oil-spill-1.15130) sources - **Inference with chi-square stat, bootstrap, and randomization** Carbone et al. 2008, Sabretooth cat sociality: [article,](https://royalsocietypublishing.org/doi/abs/10.1098/rsbl.2008.0526) [reply](http://rsbl.royalsocietypublishing.org/content/5/4/561) and [rebuttal](http://rsbl.royalsocietypublishing.org/content/5/4/563.full) - **Graphics, ANOVA (preview)** Emmons and McCullough 2003, [Counting blessings vs. burdens](https://psycnet.apa.org/record/2003-01140-012) [↩](#examples-list) --- name: case-web-module # Example: Web Module [**`learnr`**-ing ANOVA with gratitude](https://rsconnect.calvin.edu:3939/connect/#/apps/2e992b02-2ada-46f6-a04b-89e36df587ee/access) <!-- Note: plan to follow link and page through tutorial briefly to illustrate its structure -- including video, data, links, images, bite-size chunks --> <img src="images/gratitude-anova-screenshot.png" width="75%" /> <https://github.com/stacyderuiter/StatTutor> [↩](#examples-list) --- name: case-assignment-cycle # Example: Assignment Cycle **Reading + Forum --> Graphics | Models | Critique** <p class="aligncenter"> <img src="https://www.whatcomtalk.com/wp-content/uploads/2018/02/Book-Discussion-Icon.png" height="360"> <img src="images/crit-cycle.png" height="360"></p> --- # Example: Assignment Cycle <!-- L.A.R.A. in stats --> ### People, data, ethics - Oaths centering people and diversity: Virginia Eubanks' [Oath of non-harm](https://virginia-eubanks.com/2018/02/21/a-hippocratic-oath-for-data-science/), [a proposal](https://thedataist.com/a-proposal-for-data-science-ethics/) from The Dataist, [an oath for data science](https://www.ncbi.nlm.nih.gov/books/NBK532770/box/box_D-2/?report=objectonly) from the National Academies of Sciences <div style="max-width:429px"><div style="position:relative;height:0;padding-bottom:56.25%"><iframe src="https://embed.ted.com/talks/lang/en/rocio_lorenzo_how_diversity_makes_teams_more_innovative" width="429" height="240" style="position:absolute;left:45%;top:0;width:100%;height:100%" frameborder="0" scrolling="no" allowfullscreen></iframe></div></div> - Data: Hofstra et al. 2020, [*The Diversity-Innovation Paradox in Science*](https://www.pnas.org/content/117/17/9284) --- # Example: Assignment Cycle <!-- L.A.R.A. in stats --> ### Statistical consulting <img src="images/consulting.png" width="75%" /> Wang et al. 2018, [Researcher Requests for Inappropriate Analysis and Reporting: A U.S. Survey of Consulting Biostatisticians](https://www.acpjournals.org/doi/10.7326/M18-1230) --- # Example: Assignment Cycle ### Weapons of Math Destruction - Algorithms of Opression - Excerpt from Safiya Noble's *Algorithms of Oppression*: "The cultural power of algorithms". <iframe style="display: block; margin: auto;" width="420" height="236" src="https://www.youtube.com/embed/_2u_eHHzRto" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe> Epstein and Robertson, [The search engine manipulation effect (SEME) and its possible impact on the outcomes of elections](https://www.pnas.org/content/112/33/E4512) --- # Example: Assignment Cycle ### Automating Inequality - Excerpt from Virginia Eubanks' *Automating Inequality* - OR - <iframe height="200px" width="100%" frameborder="no" scrolling="no" seamless src="https://player.simplecast.com/31240d34-5a0d-4eb3-8a16-3421910cf119?dark=true"></iframe> Data from Indiana FSSA (kind-of as cited in the book) --- # Example: Assignment Cycle ### Data Feminism Ch. 5: [Unicorns, Janitors, Ninjas, Wizards and Rock Stars](https://data-feminism.mitpress.mit.edu/pub/2wu7aft8/release/2) <img src="images/data-fem.jpeg" width="25%" style="display: block; margin: auto;" /> **Data: got an idea?** --- # Example: Assignment Cycle ### Recidivism <img src="https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcQ8bPoaN1BJTqU08pm1tzbpcexZIJlbkw-w4w&usqp=CAU" width="25%" style="display: block; margin: auto;" /> - Dressel and Farid 2018, [The accuracy, fairness, and limits of predicting recidivism](https://advances.sciencemag.org/content/4/1/eaao5580) --- # Example: Assignment Cycle ### Race after Tech - Interview with Ruha Benjamin, author of *Race after Technology* <iframe style="display: block; margin: auto;" width="420" height="236" src="https://www.youtube.com/embed/7acvCo1lkHk" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe> **Data: got an idea?** --- # Example: Assignment Cycle ### Harmony Square <img src="https://misinforeview.hks.harvard.edu/wp-content/uploads/2020/11/Roz-fig-1-final.png" width="75%" style="display: block; margin: auto;" /> Roozenbeek and Van Der Linden 2020, [Breaking Harmony Square: A game that “inoculates” against political misinformation](https://misinforeview.hks.harvard.edu/article/breaking-harmony-square-a-game-that-inoculates-against-political-misinformation/) [↩](#examples-list) --- name: good-resources # What makes a good resource? * Open access * Interesting to students * Connection to course themes/topics * Topical, relevant, historically important, etc. * Readable with minimal background (or background common to all students in the class) * Models good statistical practices * Data are available - simulating similar data can be a fallback in some cases ??? Might not achieve all of these every time. --- # Finding good resources * Journals with a good track record --- # Open Access...to Data? <img src="images/castro-2017-oa-data.png" width="75%" /> Castro et al. 2017, [Evaluating and Promoting Open Data Practices in Open Access Journals](https://utpjournals.press/doi/abs/10.3138/jsp.49.1.66) --- # A few good journals? <img src="images/jamboard.png" width="85%" /> Share your ideas: <https://bit.ly/stat-lit-jamboard> ??? <!-- which journals do you know of that have short, readable articles and open data? --> <iframe src="https://embed.polleverywhere.com/discourses/dLtFUYRKvPxUC6WYGtyOp?controls=none&short_poll=true" width="960px" height="540px"></iframe> ??? Switch slide share during jamboard --- # Finding good resources * Journals with a good track record * Textbooks * Many textbook exercises are based on references you can locate. * Data might already be provided by the authors -- * Get by with a little help from your friends * Beg/Borrow/Steal from colleagues * Isostat mailing list ??? Refilling soup bowls and malaria were examples/exercises in textbooks. <!-- # Creating an activity --> <!-- Anything we want to say here? Or is the combination of what we already have and --> <!-- what will come up during the examples sufficient? --> <!-- --- --> --- name:some-advice # Some advice -- ### .right[(that we don't always follow, but probably should)] -- 1. Start early * students: evaluate figures, descriptive statistics, identify variables * faculty: don't leave this to the end of course planning --- # Some advice ### .right[(that we don't always follow, but probably should)] 1. Start early 2. Integrate into course * build several activities or segments of classes around bits of an article * can be a way to "save time" * also makes it feel like reading is "part of the course" --- # Some advice ### .right[(that we don't always follow, but probably should)] 1. Start early 2. Integrate into course 3. Take notes * what went well/poorly? which articles are working best? incremental improvements? * where did I find this article/data/etc? --- # Some advice ### .right[(that we don't always follow, but probably should)] 1. Start early 2. Integrate into course 3. Take notes 4. Work together * Ask colleagues in client disciplines * Share your good finds widely --- name: repo # Towards a Repository? **re·pos·i·to·ry** | \ ri-ˈpä-zə-ˌtȯr-ē \ 1. a place, room, or container where something is deposited or stored <br> It would be great to have a repository of articles/data sets/activities, but... * Curation takes some time and resources * Different instructors use different tools and employ different styles <br> Perhaps a little crowd sourcing can produce something useful. ??? Nothing so fancy as CRAN. --- # How to contribute Resources at .large[.center[<https://bit.ly/uscots21-stat-lit-resources>]] Populated with values from a Google Form .large[.center[ <https://bit.ly/uscots21-stat-lit-form> ]] * We would love to have you add some resources to our list! --- # Thanks ### Contact Info <table> <tr> <td> <img src = "images/rpruim-square500.jpg" height = 150> <br> Randall Pruim<br> <a href="mailto:rpruim@calvin.edu">rpruim@calvin.edu</a> </td> <td> <img src = "images/sld.jpg" height = 150><br> Stacy DeRuiter<br> <a href="mailto:sld33@calvin.edu">sld33@calvin.edu</a> </td> </tr> </table> .large[.center[ slides: <https://bit.ly/uscots21-stat-lit>]] .large[.center[ Google form: <https://bit.ly/uscots21-stat-lit-form> ]]