Final Exam (Take Home)

The Electric Company

Link in case embedded video isn’t working

(Yes, this was state-of-the-art TV when I was a kid.)

This problem is about a study of the effect of watching the PBS TV show The Electric Company on kids’ reading performance. To keep things a bit simpler, we will just use the data from one grade: grade 2.

This study used a randomized, paired design: At each school, the two worst performing classes were selected, and it was randomly determined which class would watch The Electric Company (treatment group) and which one would not (control group). In the ElecComp data set, id is a numeric code for each school, and you can see that there are two classes, one in each group, at each school.

Students were given a reading test before (pre) and after (post) the experimental period. The data provide mean scores on these tests for each class. (It would be even better to have individual-level data, but that’s not what we have.)

The data

The data live (undocumented) in the CalvinBayes package. Let’s select the second grade and take a look.

Grade2 <- ElecComp %>% 
  filter(grade == 2) 

Grade2 %>% arrange(id) %>% reactable::reactable()

Grade2 %>% gf_bar( ~ id, fill = ~ group) %>%
  gf_theme(legend.position = "top")

Note that the id numbers are not contiguous and don’t start at 1. Let’s fix that and also add a group_idx variable and recode the missing values in method so Stan doesn’t complain about them:

library(forcats) # tools for categorical variables
Grade2 <-
  Grade2 %>% 
  mutate(
    orig_id = id,
    id = as.integer(factor(id)),
    group_idx = as.integer(factor(group)),
    method = fct_explicit_na(method, na_level = "-"),
    method_idx = as.integer(factor(method))
  )

Grade2 %>% reactable::reactable()

And a few more data summaries for good measure.

df_stats(pre + post ~ group , data = Grade2)

response	group	min	Q1	median	Q3	max	mean	sd	n
pre	control	40.8	63.225	72.85	81.175	90.0	70.70882	13.60787	34
pre	treatment	51.4	69.725	75.35	83.100	97.7	75.89706	10.82190	34
post	control	67.6	85.475	97.20	102.975	110.3	93.21176	11.93149	34
post	treatment	66.4	96.275	102.05	109.125	114.6	101.57059	10.24226	34

gf_point(post ~ group, data = Grade2) %>%
  gf_line(group = ~ id, alpha = 0.5) |
gf_point((post - pre) ~ group, height = 0, width = 0.2, data = Grade2) %>%
  gf_line(group = ~ id, alpha = 0.5)

1. What are the lines on those last two plots and why are they useful?

Two Models

2. Let’s fit two models. (Feel free to adjust things like the number of iterations, etc. as needed.) Show the precis() output for each model.

ec1 <- 
  ulam(
    data = Grade2,
    alist(
      post ~ dnorm(mu, sigma),
      mu <- pre + bg[group_idx],
      bg[group_idx] ~ dnorm(0, 20),
      sigma ~ dexp(0.10)
    ), 
    chains = 4, cores = 4, iter = 2000,
    log_lik = TRUE 
  )

ec2 <- 
  ulam(
    data = Grade2,
    alist(
      post ~ dnorm(mu, sigma),
      mu <- bp * pre  + bg[group_idx],
      bg[group_idx] ~ dnorm(0, 20),
      bp ~ dnorm(1, 1),
      sigma ~ dexp(0.10)
    ), 
    chains = 4, cores = 4, iter = 2000,
    log_lik = TRUE
  )

3. Model Predictions.

For each model, create a plot that shows the predictions the model makes together with uncertainty in those predictions and the data. (There is not one right plot here, but there will be style points for how well your model communicates.)
How do the two models’ predictions differ?

4. Is one of these two models is better? If so which? Explain what you mean by better and how you decided.

5. According to the better model (or model ec2 if they are roughly equivalent), does it appear that Electric Company improves reading?

Including school id

So far our models have not taken into account that we have a paired design.

6. Add school id to ec2 as part of a multilevel model. Let’s call it ec3.

Why do we want to use a multilevel model here? [Hint: What would the alternative be?]
Explain your choice of prior for ec3.

7. What does ec3 think about the effectiveness of watching Electric Company? How does this answer compare to ec2?

8. What does ec3 think about school-to-school variation?

9. ec3 has a lot more parameters than ec2. Which model does WAIC indicate is a better choice?

Note: in a complete treatment of thes data, we might include other things as well. Examples: all four grades of students, the city in which the schools were, varying slopes, and interactions. These were not included here to keep things a tad simpler.

Method instead of group

There were actually 3 groups of classes in this study, not two. Some of the classes watched Electric Company as a replacement for their usual reading programs (R). Other classes watched Electric Company as a supplement to their regular reading programs (S).

10. Create a new model ec4 that has three groups (R, S, -) instead of two. Modify your ec3 if that is working successfully, else modify ec2. Provide the precis() output for your new model.

11. Does ec4 think that there is a preferred method? If so, which method? How much better?

Final Exam (Take Home)

Stat 341 — Spring 2021

Due: Thursday, May 13 @ 11:59 pm

Instructions

The Electric Company

The data

Two Models

Including school id

Method instead of group