Let’s create a function that computes the probability of having matching birthdays among \(n\) people. We’ll assume there are 365 equally likely birthdays.

The formula for this when there are \(n\) people is

\[ P(\mbox{matching birthday}) = 1 - P(\mbox{no matching birthday}) = 1 - \frac{365}{365} \cdot \frac{364}{365} \cdot \frac{363}{365} \cdots \frac{365 - n + 1}{365} \]

To create a function in R, we use the following template

my_function <- function(inputs) {
  # code goes here
}

Let’s try this with our birthday problem. As we will see, R has some nice features that allow us compute products of the sort we need for our birthday problem.

bday <- function(num_people) {
  1 - prod((365:(365 - num_people + 1)/365))
}

bday(10)
## [1] 0.1169482
bday(20)
## [1] 0.4114384
bday(25)
## [1] 0.5686997
bday(23)
## [1] 0.5072972

Looks like things are working. But R has the ability to receive a vector of inputs and return a vector of outputs. As written, our bday() function doesn’t do this, but we can ask R to convert it over to a vectorized function for us:

bday <- Vectorize(bday)  # convert to vectorized version
bday(1:23)               # compute multiple probabilities all at once
##  [1] 0.000000000 0.002739726 0.008204166 0.016355912 0.027135574 0.040462484
##  [7] 0.056235703 0.074335292 0.094623834 0.116948178 0.141141378 0.167024789
## [13] 0.194410275 0.223102512 0.252901320 0.283604005 0.315007665 0.346911418
## [19] 0.379118526 0.411438384 0.443688335 0.475695308 0.507297234

This allows us to generate all the data needed to create a nice graphical view of what is going on here:

gf_point(bday(1:50) ~ 1:50) |>
  gf_hline(yintercept = ~ 0.5, color = "red") |>
  gf_labs(title = "Probability of matching birthdays in a group of people",
          x = "number of people in the group",
          y = "probablitiy of matching birthdays"
  )