Express models in mathematical notation
Convert mathematical notation into quap()
code
Think a bit about priors
Investigate models using prior sampling & posterior sampling
McElreath says
I use the name
d
over and over again in this book to refer to the data frame we aer working with at the moment. I keep its name short to save you typing.
This is my least favorite feature of this text. It is very error prone (easy to grab the wrong data) and confusing (which data is d
just now?). The data being used are a really important part of your model – you always want to know what the data are. So… I will not follow this practice and I encourage you to avoid it as well.
I will usually capitalize the names of data frames and use lower case for the variables inside them. This helps to recognize which is which. But it is an uncommon convention in the R community. (The use of capitalization is the R world is mostly haphazard with some preference for lower case. You will see CamelCase, camelCase, snake_case, dot.case, and sometimes all mixed together in the same code. All I can say about this is “I’m sorry”.)
In R, there are typically 4 functions associated with each distribution. Suppose \(X \sim {\sf Dist}(...)\), where \({\sf Dist}\) is a place holder for the distribution family and \(...\) are the parameters of the distribution.
function | key word | what it computes |
---|---|---|
ddist(x, ...) |
density | the pmf or pdf function for \(X\) |
pdist(q, ...) |
probability | \(P(X \le q)\) |
qdist(p, ...) |
quantile | smallest \(q\) such that \(P(X \le q) \ge p\). |
rdist(n, ...) |
random | generate n random draws from the distribution |
In addition, the gf_dist()
function can be used to plot a distribution.
Many examples of dist
: norm
, beta
, unif
, exp
, gamma
, binom
, lnorm
, triangle
, t
, etc.
If you put an x
in front of some (the most commmon ones) of these you get something extra – a plot.
xpnorm(120, mean = 100, sd = 15)
xpbeta(0.8, shape1 = 8, shape2 = 2)
The p-function and q-function are inverses. The p-function takes a value as input and returns a probability. The q-function takes a probability as input and returns a value.
# using defualt values of mean = 0, sd = 1
qnorm(pnorm(2))
## [1] 2
pnorm(qnorm(0.2))
## [1] 0.2
These functions are vectorized. Vectorization happens over all arguments simultaneously. (This is why we needed functions like map_dbl()
and map2_dbl()
in our grid method examples.)
qnorm(0.025, mean = c(10, 20, 30), sd = c(1, 2, 3))
## [1] 8.040036 16.080072 24.120108
pnorm(c(12, 22, 21), mean = c(10, 20, 30), sd = c(1, 2, 3))
## [1] 0.977249868 0.841344746 0.001349898
runif(3, min = 1:3, max = 2:4)
## [1] 1.720904 2.875773 3.760982
Unlike most packages in R, the rethinking
packages does not use lazy data. This means you must explicitly load the data sets each time you use them. Here’s how:
library(rethinking) # load the package (already done in our template Rmd)
data(Howell1) # load a data set
precis(Howell1) # precis is fancy word for summary
## mean sd 5.5% 94.5% histogram
## height 138.2635963 27.6024476 81.108550 165.73500 ▁▁▁▁▁▁▁▂▁▇▇▅▁
## weight 35.6106176 14.7191782 9.360721 54.50289 ▁▂▃▂▂▂▂▅▇▇▃▂▁
## age 29.3443934 20.7468882 1.000000 66.13500 ▇▅▅▃▅▂▂▁▁
## male 0.4724265 0.4996986 0.000000 1.00000 ▇▁▁▁▁▁▁▁▁▇
For our models below, we want just the adults. So let’s create a new (and clearly named) data set with just the adults.
HowellAdults <-
Howell1 %>%
filter(age >= 18) # filter selects only rows that match the condition
To find out more about this data set, type ?Howell1
.
With data in hand, we are ready to model.
McElreath proposes this as a first model for heights of these adults.
\[\begin{align*} h_i & \sim \sf{Norm}(\mu, \sigma) \\ \mu & \sim \sf{Norm}(178, 20) \\ \sigma & \sim \sf{Unif}(0, 50) \end{align*}\]
Why Unif(0, 50)?
We can sample from our priors and use the sampled values of \(\mu\) and \(\sigma\) to compute some heights. This will tell us what our prior is saying about observed heights.
Prior4.1 <-
tibble(
mu = rnorm(1e4, 178, 20),
sigma = runif(1e4, 0, 50),
height = rnorm(1e4, mu, sigma)
)
head(Prior4.1)
## # A tibble: 6 x 3
## mu sigma height
## <dbl> <dbl> <dbl>
## 1 202. 47.7 167.
## 2 159. 4.86 166.
## 3 178. 37.2 198.
## 4 224. 44.7 239.
## 5 157. 33.0 122.
## 6 117. 8.03 121.
gf_dens(~height, data = Prior4.1)
Note that if we go overboard with our wide prior, we get silly results:
Prior4.1a <-
tibble(
mu = rnorm(1e4, 178, 100),
sigma = runif(1e4, 0, 50),
height = rnorm(1e4, mu, sigma)
)
last_plot() %>%
gf_dens( ~height, color = "red",
data = Prior4.1a
)
Changing the standard deviation of our prior for \(\mu\) from 20 to 100 says that a noticeable proportion of the population will have heights that are less than 0 or greater than 4 meters. That clearly does not reflect what we know about people, so it would be a poor choice for our prior.
We can use this sort of prior predictive sampling to investigate the impact of our choice of priors.
Let’s try one more, this time with an extremely small standard deviation for the prior on \(\mu\):
Prior4.1b <-
tibble(
mu = rnorm(1e4, 178, 0.1),
sigma = runif(1e4, 0, 50),
height = rnorm(1e4, mu, sigma)
)
last_plot() %>%
gf_dens( ~ height, color = "steelblue",
data = Prior4.1b
)
hdi(Prior4.1, pars = "height")
hdi(Prior4.1b, pars = "height")
## par lo hi mode prob
## 1 height 108.6064 250.5676 177.6762 0.95
## par lo hi mode prob
## 1 height 114.168 237.8867 177.9269 0.95
Because of the wide prior for \(\sigma\), this doesn’t change the prior predictions as much you might have expected. But stay tuned for a look at what this does to the posterior.
Now let’s translate our model into quap()
code. Here’s the model:
\[\begin{align*} h_i & \sim \sf{Norm}(\mu, \sigma) \\ \mu & \sim \sf{Norm}(178, 20) \\ \sigma & \sim \sf{Unif}(0, 50) \end{align*}\]
And here’s the code:
library(rethinking)
m4.1 <-
quap(
data = HowellAdults,
alist(
height ~ dnorm(mu, sigma),
mu ~ dnorm(178, 20),
sigma ~ dunif(0, 50)
)
)
Things to note:
Put the model description is inside alist()
alist()
creates an unevaluated list. This lets quap()
process the code itself, rather than what the code evaluates to.alist()
when describing the model.Use <-
for direct computations (arithmetic).
Use ~
for priors and likelihood. Think ~
for distributions.
quap()
is doing some tricky things behind the scenes
beta_0 ~ dnorm(178, 100)
will turn into dnorm(beta_0, 178, 100, log = TRUE)
quap()
can fail.
The most likely failure you will encounter early on is caused by a bad starting point for the search.
Unless you tell it otherwise, quap()
use prior sampling to pick a place to start its search. So just rerunning will often fix the problem. (Choosing a reasoble prior also helps.) If you use set.seed()
, then you will get the same random starting point each time you run the code. If things fail, change your seed and try again.
You can specify a starting point with the start
argument to quap()
.
quap(
data = HowellAdults,
alist(
height ~ dnorm(mu, sigma),
mu ~ dnorm(178, 20),
sigma ~ dunif(0, 50)
),
start = list(mu = 178, sigma = 5)
)
What makes a starting point bad? quap()
uses a “hill climbing” algorithm to search for the posterior mode. If the starting point is very far from the posterior mode and the posterior is very flat at the starting point, it is hard to know in which direction to begin the search for the mode – the local terain doesn’t give much of a clue. So quap()
may wander around aimlessly in a vast plain of low posterior density and never get close to the posterior mode. Also, if the posterior is multi-modal, we might get drawn to the wrong local maximum.
library(rethinking)
precis(m4.1)
## mean sd 5.5% 94.5%
## mu 154.606969 0.4119838 153.948539 155.265398
## sigma 7.731128 0.2913667 7.265468 8.196789
5.5%
and 94.5%
indicate that we are seeing 89% HDIs.
Why 89%? It’s just the default… I don’t recommend 95% intervals because readers will have a hard time not viewing them as significance tests….
McElreath probably goes a bit overboard here, but he is correct that 95% is just a common convention, there is nothing particularly special about it. And when working with normal distributions, it doesn’t matter what percentage you use since they all communicate exactly the same information. (Once you know any percentage, you can work out the standard deviation and mean, so you know everything there is to know.) But it isn’t clear that choosing a different arbirary number (89%) is any better than using the conventional arbitrary number (95%). In any case, you can choose whatever percent you like:
precis(m4.1, prob = .92)
## mean sd 4% 96%
## mu 154.606969 0.4119838 153.885714 155.32822
## sigma 7.731128 0.2913667 7.221037 8.24122
Post4.1 <- extract.samples(m4.1)
gf_histogram(~mu, data = Post4.1) /
gf_histogram(~sigma, data = Post4.1)
gf_point(mu ~ sigma, data = Post4.1, alpha = 0.3) %>%
gf_density2d(color = "skyblue")
Let’s take a look at how the posterior changes if we use the super narrow prior for \(\mu\):
m4.2 <-
quap(
data = HowellAdults,
alist(
height ~ dnorm(mu, sigma),
mu ~ dnorm(178, 0.1),
sigma ~ dunif(0, 50)
)
)
Post4.2 <- extract.samples(m4.2)
gf_dens(~ mu, data = Post4.1, color = ~ "Model 1") %>%
gf_dens(~ mu, data = Post4.2, color = ~ "Model 2")
gf_density2d( mu ~ sigma, data = Post4.1, color = ~"Model 1") %>%
gf_density2d( mu ~ sigma, data = Post4.2, color = ~"Model 2")
Notice that the two models have very different posterior ideas about what \(\mu\) and \(\sigma\) might be. Do you see why?
Here’s one way to express this model:
\[\begin{align*} h_i & \sim \sf{Norm}(\mu_i, \sigma) \\ \mu_i & = \beta_0 + \beta_1 * weight_i \\ \beta_0 & \sim ?? \\ \beta_1 & \sim ?? \\ \sigma & \sim ?? \\ \end{align*}\]
But we will prefer a different way:
\[\begin{align*} h_i & \sim \sf{Norm}(\mu_i, \sigma) \\ \mu_i & = \alpha_0 + \beta_1 \cdot (weight_i - \overline{weight}) \\ \alpha_0 & \sim ?? \\ \beta_1 & \sim ?? \\ \sigma & \sim ?? \\ \end{align*}\]
Why do we prefer this centered model?
It is easier to come up with a prior for \(\alpha_0\) than for \(\beta_0\) because \(\beta_0\) is hard to interpret. (On average, how tall are people who weigh nothing?)
Centering reduces or eliminates correlation among parameters in the posterior.
Which prior looks better?
mean_weight = mean( ~ weight, data = HowellAdults)
Prior4.3a <-
tibble(
alpha_0 = rnorm(100, 178, 20),
beta_1 = rnorm(100, 0, 10),
sigma = runif(100, 0, 50)
)
Prior4.3b <-
tibble(
alpha_0 = rnorm(100, 178, 20),
beta_1 = exp(rnorm(100, 0, 1)),
sigma = runif(100, 0, 50)
)
Prior4.3a %>%
gf_abline(slope = ~ beta_1, intercept = ~ alpha_0 - beta_1 * mean_weight,
alpha = 0.3) %>%
gf_hline(yintercept = ~ c(0, 272), color = "red", linetype = "dashed", data = NA) %>%
gf_lims(x = c(30, 60), y = c(-100, 400)) %>%
gf_labs(y = "height") /
Prior4.3b %>%
gf_abline(slope = ~ beta_1, intercept = ~ alpha_0 - beta_1 * mean_weight,
alpha = 0.3) %>%
gf_hline(yintercept = ~ c(0, 272), color = "red", linetype = "dashed", data = NA) %>%
gf_lims(x = c(30, 60), y = c(-100, 400)) %>%
gf_labs(y = "height")
Let’s go with the second one.
\[\begin{align*} h_i & \sim \sf{Norm}(\mu_i, \sigma) \\ \mu_i & = \alpha_0 + \beta_1 \cdot (weight_i - \overline{weight}) \\ \alpha_0 & \sim \sf{Norm}(178, 20) \\ \beta_1 & \sim \sf{LogNorm}(0, 1) \\ \sigma & \sim \sf{Unif}(0, 50) \\ \end{align*}\]
If \(X\) is normal, why is \(e^X\) called lognormal? Because its logarithm is normally distributed. Confusing, but true.
Let \(X \sim \sf{LogNorm}(\mu, \sigma)\). That is \(X = e^Y\) where \(Y \sim \sf{Norm}(\mu, \sigma)\). In R, the parameters are called logmean
and logsd
. Here are some example plots.
gf_dist('lnorm', meanlog = 0, sdlog = 1,
fill = ~ "LogNorm(0,1)", alpha = 0.3, geom = "area") %>%
# gf_dist('lnorm', meanlog = 0, sdlog = 2,
# fill = ~ "LogNorm(0,2)", alpha = 0.3, geom = "area") %>%
gf_dist('lnorm', meanlog = 0, sdlog = 1/2,
fill = ~ "LogNorm(0,1/2)", alpha = 0.3, geom = "area") %>%
gf_dist('lnorm', meanlog = 1, sdlog = 1,
fill = ~ "LogNorm(1,1)", alpha = 0.3, geom = "area") %>%
gf_lims(x = c(0,15))
The \(\mu\) parameter determines the median of the log-normal distribuiton since \(P(X \le m) = P(e^X \le e^m)\)
qlnorm(0.5, meanlog = 0, sdlog = 0:4)
## [1] 1 1 1 1 1
qlnorm(0.5, meanlog = 1, sdlog = 0:4)
## [1] 2.718282 2.718282 2.718282 2.718282 2.718282
The folowing table of statistics (also available at https://en.wikipedia.org/wiki/Log-normal_distribution) can be useful in determining good choices for \(\mu\) and \(\sigma\) in a prior.
statistic | expression |
---|---|
median | \(e^{\mu} = \exp(\mu)\) |
mean | \({\displaystyle \exp \left(\mu +{\frac {\sigma ^{2}}{2}}\right)}\) |
mode | \({\displaystyle \exp(\mu -\sigma ^{2})}\) |
variance | \({\displaystyle [\exp(\sigma ^{2})-1]\exp(2\mu +\sigma ^{2})}\) |
So larger \(\mu\) and larger \(\sigma\) make for larger variance, but the impact of \(\sigma\) is greater (because of the squaring) |
m4.3 <-
quap(
data = HowellAdults,
alist(
height ~ dnorm(mu, sigma),
mu <- alpha_0 + beta_1 * (weight - mean(weight)),
alpha_0 ~ dnorm(178, 20),
beta_1 ~ dlnorm(0, 1),
sigma ~ dunif(0, 50)
)
)
Anything that depends on parameters has a posterior distribution.
Post4.3 <- m4.3 %>% extract.samples(1e4) %>%
mutate(beta_0 = alpha_0 - beta_1 * mean(HowellAdults$weight))
(
gf_dens( ~ alpha_0 , data = Post4.3) |
gf_blank(0 ~ 0) %>% gf_theme(theme_void())
) /
(
gf_density2d(beta_1 ~ alpha_0, data = Post4.3) |
gf_dens(beta_1 ~ ., data = Post4.3)
)
library(GGally)
## Registered S3 method overwritten by 'GGally':
## method from
## +.gg ggplot2
##
## Attaching package: 'GGally'
## The following object is masked from 'package:pander':
##
## wrap
ggpairs(Post4.3, upper = list(continuous = "density"))
gf_point(height ~ weight, data = HowellAdults) %>%
gf_abline(
slope = ~ beta_1, intercept = ~ beta_0,
color = "steelblue", alpha = 0.1,
data = Post4.3 %>% head(100))
\[\begin{align*} h_i & \sim \sf{Norm}(\mu_i, \sigma) \\ \mu_i & = \alpha_0 + \beta_1 \cdot (weight_i - \overline{weight}) + \beta_2 * male \\ \alpha_0 & \sim \sf{Norm}(178, 20) \\ \beta_1 & \sim \sf{LogNorm}(0, 1) \\ \beta_2 & \sim \sf{Norm}(0, 10) \\ \sigma & \sim \sf{Unif}(0, 50) \\ \end{align*}\]
Note: The male
variable is either 0 (for females) or 1 (for males).
m4.4 <-
quap(
data = HowellAdults,
alist(
height ~ dnorm(mu, sigma),
mu <- alpha_0 + beta_1 * (weight - mean(weight)) + beta_2 * male,
alpha_0 ~ dnorm(178, 20),
beta_1 ~ dlnorm(0, 1),
beta_2 ~ dnorm(0, 10),
sigma ~ dunif(0, 50)
)
)
Post4.4 <- extract.samples(m4.4, n = 1e3) %>%
mutate(beta_0 = alpha_0 - beta_1 * mean_weight)
precis(m4.4)
## mean sd 5.5% 94.5%
## alpha_0 151.5613770 0.33723981 151.0224027 152.1003514
## beta_1 0.6407607 0.04124548 0.5748424 0.7066789
## beta_2 6.4840091 0.53268348 5.6326780 7.3353402
## sigma 4.2537758 0.16031290 3.9975649 4.5099868
ggpairs(Post4.4, upper = list(continuous = "density"))
gf_point(height ~ weight, color = ~male, data = HowellAdults) %>%
gf_abline(slope = ~ beta_1, intercept = ~ beta_0, alpha = 0.1,
color = ~0,
data = Post4.4 %>% head(100)) %>%
gf_abline(slope = ~ beta_1, intercept = ~ beta_0 + beta_2, alpha = 0.1,
color = ~1,
data = Post4.4 %>% head(100))
last_plot() %>%
gf_facet_grid( ~ male)
In this section, we want to refine our approach of plotting 100 posterior lines on the plot in two ways:
Instead of 100 individual lines, we would like to the upper and lower bound for some fraction of the lines (at each weight).
In addition to plotting the lines, which represent average height, we would like to see a range of plausible values for individual heights (at each weight).
Before dealing with each of these at each weight, we’ll start by looking at just one weight and then figure out how to scale up.
What does our model say about the heights of people who weigh 50 kg?
On average their height should be \(\mu = \alpha_0 + \beta_1 (50 - \overline{weight})\)
An individual height should be a draw from a \(\sf{Norm}(\alpha_0 + \beta_1 (50 - \overline{weight}), \sigma)\)-distribution.
Keep two things in mind
We don’t have values for the parameters (\(\alpha_0\), \(\beta_1\), and \(\sigma\)), we have (posterior) distributions. So we will have distributions as answers to these new questions as well. We may choose to summarize the distributions with a number (perhaps the mean or the median), or a range (an HDI), but these are fundamentally distributions.
The posterior distribution is what the model says (based on the data, the likelihood, and the priors). That doesn’t make it true. So in addition to figuring out what a models says, we will want to investigate whether the model should be believed in the first place. Part of that can be determined by comparing what the model says to what we know or think we know, including by comparing things to the data used to fit the model. (But performing well on the data used to fit the model is a very low bar – and fitting very well can actually be a bad thing. More on this later.)
OK. Let’s compute those posterior distributions. We’ll do this by adding additional columns to our posterior samples.
n <- nrow(Post4.3)
Post4.3 <-
Post4.3 %>%
mutate(
mean50 = beta_0 + beta_1 * 50,
individual50 = rnorm(n, mean50, sigma)
)
Post4.3 %>%
gf_dens( ~ mean50, color = ~ "mean50") %>%
gf_dens( ~ individual50, color = ~ "individual50") %>%
gf_labs(x = "height | weight = 50")
Post4.3 %>% mean_hdi(mean50)
## mean50 .lower .upper .width .point .interval
## 1 159.1285 158.4729 159.821 0.95 mean hdi
Post4.3 %>% mean_hdi(individual50)
## individual50 .lower .upper .width .point .interval
## 1 159.0531 148.564 168.4919 0.95 mean hdi
gf_point(height ~ weight, data = HowellAdults) %>%
gf_pointrange(individual50 + .lower + .upper ~ 50,
color = ~ "individual", size = 1.2,
data = Post4.3 %>% mean_hdi(individual50)) %>%
gf_pointrange(mean50 + .lower + .upper ~ 50,
color = ~ "average", size = 0.4,
data = Post4.3 %>% mean_hdi(mean50))
As we see (and as makes sense), the model makes much more precise estimates about the average than about individuals.
Now we would like to do this same thing for weights all across the range of weights in our data.
First, let’s write a couple of functions that take posterior samples and a weight and produce an HDI for the mean or individual height.
avg_height <- function(post, weight, prob = 0.95) {
post %>%
mutate(mean_height = beta_0 + beta_1 * weight) %>%
mean_hdi(mean_height, .width = prob) %>%
mutate(weight = weight)
}
ind_height <- function(post, weight, prob = 0.95) {
n <- nrow(post)
post %>%
mutate(ind_height = rnorm(n, beta_0 + beta_1 * weight, sigma)) %>%
mean_hdi(ind_height, .width = prob) %>%
mutate(weight = weight)
}
avg_height(Post4.3, 50)
## mean_height .lower .upper .width .point .interval weight
## 1 159.1285 158.4729 159.821 0.95 mean hdi 50
ind_height(Post4.3, 50)
## ind_height .lower .upper .width .point .interval weight
## 1 159.2391 149.1294 169.1683 0.95 mean hdi 50
avg_height(Post4.3, 40)
## mean_height .lower .upper .width .point .interval weight
## 1 150.0943 149.4073 150.7312 0.95 mean hdi 40
ind_height(Post4.3, 40)
## ind_height .lower .upper .width .point .interval weight
## 1 150.124 139.8024 159.9972 0.95 mean hdi 40
Now we want to apply these function to a lot of different weights.
Avg <- map_dfr(seq(30, 65, by = 1), ~ avg_height(Post4.3, .x))
Ind <- map_dfr(seq(30, 65, by = 1), ~ ind_height(Post4.3, .x))
Avg %>% reactable::reactable()
Ind %>% reactable::reactable()
And finally, we can plot the results
gf_point(height ~ weight, data = HowellAdults) %>%
gf_line(.lower ~ weight, data = Avg, color = ~"avg") %>%
gf_line(.upper ~ weight, data = Avg, color = ~"avg") %>%
gf_line(.lower ~ weight, data = Ind, color = ~"ind") %>%
gf_line(.upper ~ weight, data = Ind, color = ~"ind")
gf_ribbon(.lower + .upper ~ weight, data = Ind, fill = ~"ind") %>%
gf_ribbon(.lower + .upper ~ weight, data = Avg, fill = ~"avg") %>%
gf_point(height ~ weight, data = HowellAdults, inherit = FALSE)
Notice that most but not all of the dots fall in the blue band. That’s a good sign. The model predicts that 95% of heights will fall in the blue region.
This is a useful exercise for understanding how the posterior sampling is working, and in case you ever need to do something special, for which there is not pre-made tool.
But creating posterior distributions of averages and simulated individuals is very common, so we’d like a simpler way to get those.
link()
and sim()
The link()
and sim()
functions start with a model, extract posterior samples, and then compute the average response or a simulated individual response for each row in the posterior – just like we did above.
Link4.3 <-
link(m4.3, tibble(weight = seq(30, 65, by = 1)))
Sim4.3 <-
sim(m4.3, tibble(weight = seq(30, 65, by = 1)))
str(Link4.3)
## num [1:1000, 1:36] 139 138 138 139 139 ...
str(Sim4.3)
## num [1:1000, 1:36] 146 136 137 138 133 ...
The result is a matrix. That’s not quite as nice to work with as the tidy data frame we made, but it sure was easy. So let’s learn how to work with matrices.
Each row is a posterior sample. Each column is one of the weights we specified. The colums are not labeled, so it is up to us to keep track of the weights we used.
We can apply a function to each row or column of a matrix using apply()
.
apply(Link4.3, 2, mean)
## [1] 138.7927 139.6959 140.5990 141.5021 142.4052 143.3083 144.2114 145.1146
## [9] 146.0177 146.9208 147.8239 148.7270 149.6302 150.5333 151.4364 152.3395
## [17] 153.2426 154.1458 155.0489 155.9520 156.8551 157.7582 158.6613 159.5645
## [25] 160.4676 161.3707 162.2738 163.1769 164.0801 164.9832 165.8863 166.7894
## [33] 167.6925 168.5957 169.4988 170.4019
apply(Sim4.3, 2, mean)
## [1] 138.9238 139.9080 140.6339 141.4232 142.2916 143.4852 143.8948 145.1874
## [9] 145.8804 146.8247 147.8853 148.8282 149.6284 150.4477 151.4615 152.3452
## [17] 153.3931 154.2658 155.2714 155.7802 156.7112 157.9563 158.6530 159.5821
## [25] 160.2852 161.1486 162.3408 163.2250 163.8414 164.9250 165.7233 166.7799
## [33] 167.6513 168.4459 169.2927 170.3008
apply(Sim4.3, 2, mean_hdi)
## [[1]]
## y ymin ymax .width .point .interval
## 1 138.9238 129.763 148.893 0.95 mean hdi
##
## [[2]]
## y ymin ymax .width .point .interval
## 1 139.908 130.1833 148.9155 0.95 mean hdi
##
## [[3]]
## y ymin ymax .width .point .interval
## 1 140.6339 131.6083 151.0646 0.95 mean hdi
##
## [[4]]
## y ymin ymax .width .point .interval
## 1 141.4232 131.6253 150.9545 0.95 mean hdi
##
## [[5]]
## y ymin ymax .width .point .interval
## 1 142.2916 132.953 153.6856 0.95 mean hdi
##
## [[6]]
## y ymin ymax .width .point .interval
## 1 143.4852 133.2422 152.4965 0.95 mean hdi
##
## [[7]]
## y ymin ymax .width .point .interval
## 1 143.8948 134.3319 154.1051 0.95 mean hdi
##
## [[8]]
## y ymin ymax .width .point .interval
## 1 145.1874 135.1368 155.1997 0.95 mean hdi
##
## [[9]]
## y ymin ymax .width .point .interval
## 1 145.8804 135.6577 155.0912 0.95 mean hdi
##
## [[10]]
## y ymin ymax .width .point .interval
## 1 146.8247 135.9126 155.2908 0.95 mean hdi
##
## [[11]]
## y ymin ymax .width .point .interval
## 1 147.8853 138.4962 158.0911 0.95 mean hdi
##
## [[12]]
## y ymin ymax .width .point .interval
## 1 148.8282 138.8428 158.2813 0.95 mean hdi
##
## [[13]]
## y ymin ymax .width .point .interval
## 1 149.6284 138.8307 159.9383 0.95 mean hdi
##
## [[14]]
## y ymin ymax .width .point .interval
## 1 150.4477 139.6388 159.7387 0.95 mean hdi
##
## [[15]]
## y ymin ymax .width .point .interval
## 1 151.4615 141.984 161.7831 0.95 mean hdi
##
## [[16]]
## y ymin ymax .width .point .interval
## 1 152.3452 142.3914 161.6529 0.95 mean hdi
##
## [[17]]
## y ymin ymax .width .point .interval
## 1 153.3931 143.6974 163.217 0.95 mean hdi
##
## [[18]]
## y ymin ymax .width .point .interval
## 1 154.2658 144.7579 164.3611 0.95 mean hdi
##
## [[19]]
## y ymin ymax .width .point .interval
## 1 155.2714 146.1595 166.0853 0.95 mean hdi
##
## [[20]]
## y ymin ymax .width .point .interval
## 1 155.7802 145.8525 165.1172 0.95 mean hdi
##
## [[21]]
## y ymin ymax .width .point .interval
## 1 156.7112 147.2643 166.2385 0.95 mean hdi
##
## [[22]]
## y ymin ymax .width .point .interval
## 1 157.9563 148.0792 167.8134 0.95 mean hdi
##
## [[23]]
## y ymin ymax .width .point .interval
## 1 158.653 148.7478 168.5664 0.95 mean hdi
##
## [[24]]
## y ymin ymax .width .point .interval
## 1 159.5821 150.3488 170.1216 0.95 mean hdi
##
## [[25]]
## y ymin ymax .width .point .interval
## 1 160.2852 151.2575 169.6647 0.95 mean hdi
##
## [[26]]
## y ymin ymax .width .point .interval
## 1 161.1486 151.3547 171.8448 0.95 mean hdi
##
## [[27]]
## y ymin ymax .width .point .interval
## 1 162.3408 151.162 171.8525 0.95 mean hdi
##
## [[28]]
## y ymin ymax .width .point .interval
## 1 163.225 153.0656 172.3235 0.95 mean hdi
##
## [[29]]
## y ymin ymax .width .point .interval
## 1 163.8414 152.7433 172.7558 0.95 mean hdi
##
## [[30]]
## y ymin ymax .width .point .interval
## 1 164.925 154.7307 174.4543 0.95 mean hdi
##
## [[31]]
## y ymin ymax .width .point .interval
## 1 165.7233 156.1104 176.3369 0.95 mean hdi
##
## [[32]]
## y ymin ymax .width .point .interval
## 1 166.7799 157.1139 175.9081 0.95 mean hdi
##
## [[33]]
## y ymin ymax .width .point .interval
## 1 167.6513 156.5707 177.1343 0.95 mean hdi
##
## [[34]]
## y ymin ymax .width .point .interval
## 1 168.4459 158.5129 178.6173 0.95 mean hdi
##
## [[35]]
## y ymin ymax .width .point .interval
## 1 169.2927 159.4048 179.02 0.95 mean hdi
##
## [[36]]
## y ymin ymax .width .point .interval
## 1 170.3008 160.614 178.9484 0.95 mean hdi
apply(Sim4.3, 2, mean_hdi) %>%
bind_rows() %>%
mutate(weight = seq(30, 65, by = 1))
## y ymin ymax .width .point .interval weight
## 1 138.9238 129.7630 148.8930 0.95 mean hdi 30
## 2 139.9080 130.1833 148.9155 0.95 mean hdi 31
## 3 140.6339 131.6083 151.0646 0.95 mean hdi 32
## 4 141.4232 131.6253 150.9545 0.95 mean hdi 33
## 5 142.2916 132.9530 153.6856 0.95 mean hdi 34
## 6 143.4852 133.2422 152.4965 0.95 mean hdi 35
## 7 143.8948 134.3319 154.1051 0.95 mean hdi 36
## 8 145.1874 135.1368 155.1997 0.95 mean hdi 37
## 9 145.8804 135.6577 155.0912 0.95 mean hdi 38
## 10 146.8247 135.9126 155.2908 0.95 mean hdi 39
## 11 147.8853 138.4962 158.0911 0.95 mean hdi 40
## 12 148.8282 138.8428 158.2813 0.95 mean hdi 41
## 13 149.6284 138.8307 159.9383 0.95 mean hdi 42
## 14 150.4477 139.6388 159.7387 0.95 mean hdi 43
## 15 151.4615 141.9840 161.7831 0.95 mean hdi 44
## 16 152.3452 142.3914 161.6529 0.95 mean hdi 45
## 17 153.3931 143.6974 163.2170 0.95 mean hdi 46
## 18 154.2658 144.7579 164.3611 0.95 mean hdi 47
## 19 155.2714 146.1595 166.0853 0.95 mean hdi 48
## 20 155.7802 145.8525 165.1172 0.95 mean hdi 49
## 21 156.7112 147.2643 166.2385 0.95 mean hdi 50
## 22 157.9563 148.0792 167.8134 0.95 mean hdi 51
## 23 158.6530 148.7478 168.5664 0.95 mean hdi 52
## 24 159.5821 150.3488 170.1216 0.95 mean hdi 53
## 25 160.2852 151.2575 169.6647 0.95 mean hdi 54
## 26 161.1486 151.3547 171.8448 0.95 mean hdi 55
## 27 162.3408 151.1620 171.8525 0.95 mean hdi 56
## 28 163.2250 153.0656 172.3235 0.95 mean hdi 57
## 29 163.8414 152.7433 172.7558 0.95 mean hdi 58
## 30 164.9250 154.7307 174.4543 0.95 mean hdi 59
## 31 165.7233 156.1104 176.3369 0.95 mean hdi 60
## 32 166.7799 157.1139 175.9081 0.95 mean hdi 61
## 33 167.6513 156.5707 177.1343 0.95 mean hdi 62
## 34 168.4459 158.5129 178.6173 0.95 mean hdi 63
## 35 169.2927 159.4048 179.0200 0.95 mean hdi 64
## 36 170.3008 160.6140 178.9484 0.95 mean hdi 65
Wow, that’s pretty slick. Let’s remake our plots using this method.
Avg <-
apply(Link4.3, 2, mean_hdi) %>%
bind_rows() %>%
mutate(weight = seq(30, 65, by = 1))
Ind <-
apply(Sim4.3, 2, mean_hdi) %>%
bind_rows() %>%
mutate(weight = seq(30, 65, by = 1))
gf_ribbon(ymin + ymax ~ weight, data = Ind, fill = ~"ind") %>%
gf_ribbon(ymin + ymax ~ weight, data = Avg, fill = ~"avg") %>%
gf_point(height ~ weight, data = HowellAdults, inherit = FALSE)
We will continue to address all of these as we go along.