I saw lots of good things on this test, and it also revealed some things that some of us don’t fully understand yet. Some of those things are to be expected early in the course – others are things we really want to get in place ASAP. Be sure to look over the comments below and the ones on your test.
For part a, I used a binar system to record points off (adding combinations of -1, -2 and -4 pts). So if you had 3 points off, you will see both the -1 and -2 options marked.
Some inportant things about prior, likelihood, and posterior.
A key word that should appear in your description of all three is parameter or parameters – understanding how each is related to parameters is key to understanding what these are.
The prior and posterior are both distributions. The likelihood function is not a distribution. But all three are functions of the parameters.
Although we talked about hypotheses the first day or two as a way to motivate Bayesian inference, the word hypothesis really should not be anywhere in these descriptions.
Be careful about how you use the word data. In particular, the prior does not try to tell us something about the data per se. The data enter in via the likelihood function.
Note that the prior and likelihood do not influence or depend on each other, but both are used to compute the posterior. I saw several answers that made incorrect connections between prior and likelihood.
I used binary grading here as well.
It is important to understand what the parameters mean and what the summaries of the posterior distributions for the parameters tell us. For understanding the parameters, often a little bit of algebraic thinking is important – what role does the parameter have in the model equation(s)? Be sure to interpret each one in the context of the others.
\(\sigma\) is not an estimate for the standard deviation of the response for all subjects. It is an estimate for the standard deviation of the response for all subjects with the same predictor variable values. That is a big difference. (In fact, comparing those two is one way of measuring the fit of a model.)
Some of you didn’t really address (correctly) what the HDIs are telling us. Don’t get fixated on the posterior mean to the exclusion of the spread of the posterior distribution. (Remember, the posterior is a distribution, not either of these summaries, but providing a sense of center and spread provides a good deal of information about the distribution.)
Be mindful of units and especially of making direct comparisons of things that have different units.
If this problem gave you trouble, it might be good for us to have a conversation.
When talking about variability, always make it clear whether you are talking about variability in the posterior distribution (of some parameter) or taking about variability in (some subset of) the data.
This mostly went pretty well, and I think the comments in Gradescope probably suffice.
Several of you had more than one likelihood. For a given model, there is just one likelihood (and one prior and one posterior). Each of these involves all of the parameters. You could have split the data into two pieces and used two models – on for each piece – and then used the posterior of the first model as the prior of the second, but none of you took that approach. This is a particular simple situation, and the two parameters don’t interact much in the model, but that isn’t usually the case. Even just changing the prior here can make things more complicated.
Several of you made plots with a formula of the general type posterior ~ parameter
. This does not work when your grid has multiple parameters, and some of your plots showed that something was wrong. If you have a 100 by 100 grid, then there will be 100 different posterior values or each parameter value, so your plot will be showing 100 different values, not a single value. And none of those values is the correct value. The easiset way to deal with this is probably posterior sampling, but you can also add across the other parameters.
The main difference between using a normal distribution and a log-normal distribution for a parameter is that one of these distributions allows the parameter value to be negative and the other does not.
Something I did not take points off for was making a conclusion about a difference by looking for “overlap” of the posterior distributions. This is not a good approach and can lead to incorrect conclusions. The proper way to do this is to compute a posterior distribution for the difference.
The rubric items here should be helpful (I hope).
One of the more common errors was failing to use sim()
(or to do essentially the same type of computation yourself) when trying to determine the precision of an individual estiamte.
Note: You can’t compute a residual for a counterfactual because you won’t have an observed response to compare with the prediction. Some of you did some interesting things to try to work around this problem. But an HDI computed from sim()
is a better approach.
I saw a nice variety of useful plots for checking the fit. But I also saw some strange plots (and several of you noted that your plots looked strange, but didn’t seem to know what was going on). When checking fits, you need to take into account all of the predictors. So a plot of response ~ preditor1
is going to be noisy if you are ignoring ther predictors. There are several ways to do this, include using separate panels or colors, or comparing predicted and observed reponses. Here is example of a plot that is a little different from most.
A perfect fit would place the dots directly on the dotted line. This plot shows HDIs for the mean response. It could also be done with HDIs for an individual response. A type of plot made by more of you looked something like this.
Notice how color is used to separate the different species since the model makes a different prediction for each species, even if the sepal width is the same.