8 Multiple Regression
1.
The SAT
data set in the mosaicData
package includes the following variables
for each state in the US:
sat
– the average SAT score in 1994-95.expend
– how much money was spent per student on education in 1994-94 (in thousands of US dollars), andfrac
– the fraction of students in the state who took the SAT 1994-95.
Fit a model that predicts
sat
fromexpend
andfrac
.Compute a 95% HDI for each of the parameters in your model (on the natural scale).
Compare the posterior distributions for the coefficient on
expend
in this model the one in the model in Exercise 7.5. How do you interpret the difference?How do you interpret the posterior distribution of the coefficient on
frac
? What does this tell us about (what the model thinks) about SAT scores?Compute the residual (ie, mean of posterior residual distribution) for the state of Michigan.
Are there any states that this model predicts particularly poorly?
2.
The SAT
data set in the mosaicData
package includes the following variables
for each state in the US:
sat
– the average SAT score in 1994-95.ratio
– the average student-teacher ratio in the state.frac
– the fraction of students in the state who took the SAT 1994-95.
Fit a model that predicts
sat
fromratio`` and
frac`.Compute a 95% HDI for each of the parameters in your model (on the natural scale).
How do you interpret the posterior distribution of the coefficient on
ratio
? What does this tell us about (what the model thinks) about SAT scores?How do you interpret the posterior distribution of the coefficient on
frac
? What does this tell us about (what the model thinks) about SAT scores?Compute the residual (ie, mean of posterior residual distribution) for the state of Michigan.
Are there any states that this model predicts particularly poorly?
3. Do Rethinking 6M2 with the following additional instructions/clarifications.
In your simulation, choose x to be uniformly spaced across the range from 0 to 10. You can do this with
runif()
or by sampling from0:10
. (Those will be different, but either one should work.)In your simulation, y should depend on z but not (directly) on x, as the DAG shows.
Compute and show the correlation coefficient between your simulated X and Z. Let’s say anything above 0.95 qualifies as “very large”. (But don’t let it be 1.) Tweak your simulation if you don’t get a “very large” correlation. [Side note: You can apply the
cor()
function to a data frame to get a table of all the pairwise correlations.]Fit your model(s) using
ulam()
. Include some diagnostic output (traceplots, effective sample size, etc.) to make sure that the algorithm appears to be converging properly.Now answer the questions from 6M2.
4. Return to the previous problem. Fit a model using only z as a predictor and compare (via a suitable plot) the predictions of this model to the model in the previous exercise.