8 Multiple Regression

1. The SAT data set in the mosaicData package includes the following variables for each state in the US:

sat – the average SAT score in 1994-95.
expend – how much money was spent per student on education in 1994-94 (in thousands of US dollars), and
frac – the fraction of students in the state who took the SAT 1994-95.

Fit a model that predicts sat from expend and frac.
Compute a 95% HDI for each of the parameters in your model (on the natural scale).
Compare the posterior distributions for the coefficient on expend in this model the one in the model in Exercise 7.5. How do you interpret the difference?
How do you interpret the posterior distribution of the coefficient on frac? What does this tell us about (what the model thinks) about SAT scores?
Compute the residual (ie, mean of posterior residual distribution) for the state of Michigan.
Are there any states that this model predicts particularly poorly?

2. The SAT data set in the mosaicData package includes the following variables for each state in the US:

sat – the average SAT score in 1994-95.
ratio – the average student-teacher ratio in the state.
frac – the fraction of students in the state who took the SAT 1994-95.

Fit a model that predicts sat from ratio`` andfrac`.
Compute a 95% HDI for each of the parameters in your model (on the natural scale).
How do you interpret the posterior distribution of the coefficient on ratio? What does this tell us about (what the model thinks) about SAT scores?
How do you interpret the posterior distribution of the coefficient on frac? What does this tell us about (what the model thinks) about SAT scores?
Compute the residual (ie, mean of posterior residual distribution) for the state of Michigan.
Are there any states that this model predicts particularly poorly?

3. Do Rethinking 6M2 with the following additional instructions/clarifications.

In your simulation, choose x to be uniformly spaced across the range from 0 to 10. You can do this with runif() or by sampling from 0:10. (Those will be different, but either one should work.)
In your simulation, y should depend on z but not (directly) on x, as the DAG shows.
Compute and show the correlation coefficient between your simulated X and Z. Let’s say anything above 0.95 qualifies as “very large”. (But don’t let it be 1.) Tweak your simulation if you don’t get a “very large” correlation. [Side note: You can apply the cor() function to a data frame to get a table of all the pairwise correlations.]
Fit your model(s) using ulam(). Include some diagnostic output (traceplots, effective sample size, etc.) to make sure that the algorithm appears to be converging properly.
Now answer the questions from 6M2.

4. Return to the previous problem. Fit a model using only z as a predictor and compare (via a suitable plot) the predictions of this model to the model in the previous exercise.