Goals for today

  1. Learn what B-splines are.

  2. See how to use B-splines in a regression model

  3. Use something better than a uniform distribution as the prior for \(\sigma\).

When do the cherries blossom?

A new data set

We would like to know if there is any interesting trend in this scatter plot showing the day of the year on which cherry blossoms opened in Kyoto city.

data(cherry_blossoms)     # rethinking requires that we load the data with data()
gf_point(doy ~ year, data = cherry_blossoms)
## Warning: Removed 388 rows containing missing values (geom_point).

Our model

Let \(d\) be the day of the year on which the cherries begin to blossom and let \(y\) be the year. We would like a model of the following form:

\[\begin{align*} d &\sim {\sf Norm}(\mu, \sigma) \\ \mu_i &= a + f(y_i; w) \\ a &\sim {\sf Norm}(100, 10) \\ w_i &\sim {\sf Norm}(0, 10) \\ \sigma &\sim {\sf Exp}(1) \\ \end{align*}\]

\(f(y; w)\) will be a function of the year that depends on some weights \(w\). Those weights will be parameters in our model. The type of function could in principle be anything, but we’ll be learning about b-spline functions today. This same framework would work for a polynomial regression or many other parameterized families of functions.

Notice too that our author has chosen a better prior for \(\sigma\) for this model. Why expontential?

  • Exponential distributions are never negative – just like standard deviations.

  • Exponential distributions are relatively easy and only have one parameter to worry about.

    Exponential distributions are completely characterized by their standard deviation. The rate parameter is the reciprocal of the standard deviation. So using an exponential distribution amounts to saying that all we know about the prior is a standard deviation. The prior above has a standard deviation of 1, so the rate is also 1 and most of the distribution is between 0 and 4:

    pdist("exp", q = 4, rate = 1)

    ## [1] 0.9816844

    For any exponential distribution, that same proportion will be between 0 and 4 times the mean.

    pdist("exp", q = 4 * 3, rate = 1/3)

    ## [1] 0.9816844
  • There is no hard upper bound on the possible values as there is with a uniform distribution.

    This is an improvement over the uniform distributions.

Other possibilties for a prior on \(\sigma\) might include log-normal, gamma, half-normal or half-t distributions.

B-splines

A B-spline (short for basis spline) of degree \(d\) is a continuous piecewise polynomial function where each piece is a polynomial of degree \(d\). The places where the pieces meet are called knots, and the pieces meet smoothly at the knots (their derivatives match) if \(d > 1\). (Forcing lines to have matching slopes where they meet would require the spline to be a line.)

B-splines are built from a collection of basis functions \(B_j(x)\) and weights.

\[ f(x) = \sum_j w_j B_j(x) \] In our regression model, we can let the weights \(w_j\) be parameters of the model.

The basis functions \(B_j\) have the following properties:

  1. At at knot, exactly \(d\) of them are non-zero.

  2. Between knots, exactly \(d + 1\) of them are non-zero.

  3. The sum of all the all the basis functions is 1 at each point in the interval:

\[\sum_j B_j(x) = 1\]

  1. If there are \(k\) knots and the degree is \(d\), there will be \(k + d + 1\) basis functions. (\(d+1\) are non-zero left of the first knot, then for each knot we drop one basis function and an a new one.)

Let’s see an example of degree 2 (piecewise parabolas).

library(splines)
x <- seq(0, 10, by = 0.1)
knots <- c(2, 3, 5, 8)
B <- bs(x, knots = knots, degree = 2, intercept = TRUE)
str(B)
##  'bs' num [1:101, 1:7] 1 0.902 0.81 0.722 0.64 ...
##  - attr(*, "dimnames")=List of 2
##   ..$ : NULL
##   ..$ : chr [1:7] "1" "2" "3" "4" ...
##  - attr(*, "degree")= int 2
##  - attr(*, "knots")= num [1:4] 2 3 5 8
##  - attr(*, "Boundary.knots")= num [1:2] 0 10
##  - attr(*, "intercept")= logi TRUE

The object B is a matrix with some additional information attached. This matrix has a row for each value of \(x\) and a column for each basis function \(B_j\). The value in row \(i\), column \(j\) is

\[ B_{ij} = B_j(x_i) \] Notice that we are abusing notation a bit. On the left side, \(B\) is a matrix. On the right side \(B_j\) is a function.

Summing along the rows should give 1 for each row. Let’s check to see.

B %>% apply(1, sum)
##   [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [38] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [75] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

Let’s rearrange our matrix B into a data frame with columns indicating the row, column and value. This will make it easy to plot the functions \(B_i\).

Bdf <- 
  expand_grid(
    basis = factor(1:ncol(B)),
    x = x,  
  ) %>%
  mutate(value = as.vector(B)) 

Bdf %>% head(4)
## # A tibble: 4 x 3
##   basis     x value
##   <fct> <dbl> <dbl>
## 1 1       0   1    
## 2 1       0.1 0.902
## 3 1       0.2 0.81 
## 4 1       0.3 0.722
Bdf %>%
  gf_line(value  ~ x, color = ~ basis, size = 1.2, alpha = 0.7) %>%
  gf_vline(xintercept = ~knots, data = NA, alpha = 0.5)

The spline we get will depend on the weights used to scale the basis functions. If we let our weights be 1, 2, 2, 1, -1, 3, 2, we get

weights <- c(1, 2, 2, 1, -1, 3, 2)
Bwdf <-
  expand_grid(
    basis = factor(1:ncol(B)),
    x = x,  
  ) %>%
  mutate(value = as.vector(B * rep(weights, each = nrow(B))))
  
Bwdf %>%
  gf_line(value  ~ x, color = ~ basis, size = 1, alpha = 0.7) %>%
  gf_vline(xintercept = ~knots, data = NA, alpha = 0.5) %>%
  gf_line( B %*% c(1, 2, 2, 1, -1, 3, 2) ~ x, data = NA, 
           size = 1.2, inherit = FALSE)

Advantages of B-splines over polynomials

  1. For the same degree, B-splines can wiggle more.

    A parabola has one peek or valley. A degree-2 B-spline can have multiple peeks and valleys (up to one per knot).

  2. The weights of B-splines have only local influence.

    The coefficients of a higher-degreee polyomial influence every point along the curve. The weights of a B-spline only influence over the range of a few knots.

    This makes it easier to come up with good priors. Priors on the weights can be used to say something about the “amplititude” of the spline since the value of the spline function is bounded by the values of the weights:

\[ f(x_i) = \sum_j w_j \cdot B_j(x_i) \le \max(w_j) \sum B_j(x_i) = \max(w_j) \;. \]

So in many applications a low-degree spline (perhaps with quite a few knots) will be better than a high-degree polynomial. Cubic splines are a common choice (and the default for bs()).

Exploring some B-splines

Let’s turn our code above into a function that we can use to explore B-splines.

explore_splines <- 
  function(min, max, knots, degree = 2, intercept = TRUE, weights = c()) {
    K <- tibble(knot = knots)
    x <- seq(min, max, length.out = 501)
    B <- bs(x, knots = knots, degree = degree, intercept = intercept)
    Bdf <- 
      expand_grid(
        basis = factor(1:ncol(B)),
        x = x  
      )  %>%
      mutate(value = as.vector(B)) 
    
    if (length(weights) != ncol(B)) { 
      p <-
        gf_line(value  ~ x, data = Bdf, color = ~ basis, size = 1, alpha = 0.4) %>%
        gf_vline(xintercept = ~ knot, data = K, 
                 alpha = 0.5, inherit = FALSE, linetype = "dotted") %>%
        gf_point(0 ~ knot, data = K, alpha = 0.5, shape = 21, inherit = FALSE, 
                 size = 2.0) %>%
        gf_theme(legend.position = "none")
    } else {
      Bwdf <-
        expand_grid(
          basis = factor(1:ncol(B)),
          x = x,  
        ) %>%
        mutate(value = as.vector(B * rep(weights, each = nrow(B))))
      
      Spline <- tibble(
        x = x,
        y = B %*% weights
      )
      p <-
        Bwdf %>%
        gf_line(value  ~ x, color = ~ basis, size = 1, alpha = 0.7) %>%
        gf_vline(xintercept = ~knot, data = K, alpha = 0.5) %>%
        gf_line(y ~ x, data = Spline,  size = 1.2, inherit = FALSE)
    }
    p %>%
      gf_labs(title = paste("degree =", degree, ";", 
                            "number of knots =", length(knots))
              )
  }
explore_splines(0, 10, knots = c(1, 4,5,6, 9), degree = 1, 
                weights = c(1,2,1,-1,2,1,2))

explore_splines(0, 10, knots = c(1, 4,5,6, 9), degree = 1)

explore_splines(0, 10, knots = c(1, 4,5,6, 9), degree = 2)

explore_splines(0, 10, knots = c(1, 4,5,6, 9), degree = 3)

explore_splines(0, 10, knots = c(1, 4,5,6, 9), degree = 1, 
                weights = c(1,2,1,-1,2,1,0))

explore_splines(0, 10, knots = c(1, 4,5,6, 9), degree = 2, 
                weights = c(1,2,1,-1,2,1,-1,2))

explore_splines(0, 10, knots = c(1, 4,5,6, 9), degree = 3, 
                weights = c(1,2,1,-1,2,1,-1,-1,2))

Using B-splines with quap()

In principle, using B-splines is no different from using a polynomial as our regression function. Both are the sum of a product of coefficients/weights times transformations of x. But since our information about splines comes in the form of a matrix, we’ll process things a bit differently. The easiest way uses matrix multiplication.

\[ f(x) = \sum_j w_j B_j(x) = \left(B w \right)_{i1} \] if we let \(w\) be a column matrix containing our weights.

Generally, when doing Bayesian analysis, it will be a good idea to remove missing data before proceeding.

Cherry <-
  cherry_blossoms %>% 
  drop_na(doy, year)

# A little sanity checking

tally(is.na(doy) ~ is.na(year), data = cherry_blossoms)
##           is.na(year)
## is.na(doy) TRUE FALSE
##      TRUE     0   388
##      FALSE    0   827
dim(cherry_blossoms)
## [1] 1215    5
dim(Cherry)
## [1] 827   5

Now we need to pick the type of spline to use. The two choices are the degree (let’s use 3) and the knots (let’s use 15 knots that divide the data into an equal number of observations).

knots <- quantile(Cherry$year, (1:15)/16)
knots
##    6.25%    12.5%   18.75%      25%   31.25%    37.5%   43.75%      50% 
## 1017.625 1146.500 1230.875 1325.000 1413.250 1471.000 1525.375 1583.000 
##   56.25%    62.5%   68.75%      75%   81.25%    87.5%   93.75% 
## 1641.625 1696.250 1751.875 1803.500 1855.125 1908.750 1963.375
B <- bs(Cherry$year, degree = 3, knots = knots, intercept = TRUE)

Now we just need to figure out how to get B and some weights into our model.

m4.7 <- quap(
  data = list(doy = Cherry$doy, B = B),
  alist(
    doy ~ dnorm(mu, sigma),
    mu <-  a + B %*% w,
    a ~ dnorm(100, 10),
    w ~ dnorm(0, 10),
    sigma ~ dexp(1)
  ),
  start = list(w = rep(0, ncol(B)))  # this tells quap how many w's there are
)

Notice:

  • Our data argument is a list containing the day of year vector and our spline matrix.
  • We need to specify how many slots are in w. We do this by providing 19 starting values. (19 because we have 15 knots
precis(m4.7)
## 19 vector or matrix parameters hidden. Use depth=2 to show them.
##             mean        sd      5.5%      94.5%
## a     103.441521 2.2489688 99.847235 107.035808
## sigma   5.868414 0.1435574  5.638981   6.097847

OK. Let’s do what it says…

precis(m4.7, depth = 2)
##              mean        sd        5.5%      94.5%
## w[1]   -3.5420212 3.8698734  -9.7268264   2.642784
## w[2]    0.1882338 3.9190200  -6.0751171   6.451585
## w[3]   -3.0237397 3.6027115  -8.7815686   2.734089
## w[4]    5.5428091 2.8361836   1.0100399  10.075578
## w[5]   -0.9004366 2.7736190  -5.3332154   3.532342
## w[6]    4.3635325 2.9545622  -0.3584285   9.085493
## w[7]   -1.6882023 2.8601864  -6.2593325   2.882928
## w[8]   -1.8467911 2.7665035  -6.2681980   2.574616
## w[9]    7.2139127 2.7859633   2.7614052  11.666420
## w[10]  -0.5498741 2.8661616  -5.1305539   4.030806
## w[11]   1.3505920 2.8822565  -3.2558106   5.956995
## w[12]   6.0919798 2.8474981   1.5411278  10.642832
## w[13]  -0.4402277 2.8317214  -4.9658655   4.085410
## w[14]   3.8937334 2.8309592  -0.6306862   8.418153
## w[15]   3.1556860 2.8634753  -1.4207005   7.732073
## w[16]   0.9770392 2.9827298  -3.7899391   5.744018
## w[17]  -2.6373549 3.2920866  -7.8987450   2.624035
## w[18]  -6.7515215 3.3958837 -12.1787995  -1.324244
## w[19]  -7.9400214 3.2287718 -13.1002224  -2.779820
## a     103.4415211 2.2489688  99.8472346 107.035808
## sigma   5.8684140 0.1435574   5.6389815   6.097847

That shows us all 21 parameters of our model, but it doesn’t really help us understand it much. Let’s make some posterior plots.

Link4.7 <- link(m4.7)
Avg4.7 <-
  apply(Link4.7, 2, mean_hdi) %>%
  bind_rows() %>%
  bind_cols(Cherry)
gf_point(doy ~ year, data = Cherry) %>%
  gf_ribbon(ymax + ymin ~ year, data = Avg4.7)

Sim4.7 <- sim(m4.7)
Ind4.7 <-
  apply(Sim4.7, 2, mean_hdi) %>%
  bind_rows() %>%
  bind_cols(Cherry)
gf_point(doy ~ year, data = Cherry) %>%
  gf_ribbon(ymax + ymin ~ year, data = Ind4.7)

W <- precis(m4.7, depth = 2)$mean %>% head(-2)

explore_splines(min = min(Cherry$year), max = max(Cherry$year), knots = knots,
                degree = 3, intercept = TRUE)

explore_splines(min = min(Cherry$year), max = max(Cherry$year), knots = knots,
                degree = 3, intercept = TRUE, weights = W)