Goal: Learn how to use R to simulate flipping coins, rolling dice, and dealing cards – three common examples used in probability examples.

Flipping coins

To simulate flipping coins, we will use two important R functions:


Example 1

Toss a fair coin 20 times. Let \(X\) be the number of heads.

rflip(20)
## 
## Flipping 20 coins [ Prob(Heads) = 0.5 ] ...
## 
## H H H H T H H T H T T H H H T T H T H T
## 
## Number of Heads: 12 [Proportion Heads: 0.6]

Let’s do that several times

do(5) * rflip(20)
##    n heads tails prop
## 1 20    13     7 0.65
## 2 20    11     9 0.55
## 3 20    11     9 0.55
## 4 20     9    11 0.45
## 5 20    13     7 0.65

Notice that do() records the information in a nice summarized format.

If we want to estimate probability, we need to repeat this many times. Let’s do it 10,000 times.

Flips20 <- do(10000) * rflip(20)

Let’s look at the first few rows to make sure we know what we have.1

head(Flips20) %>% pander()
n heads tails prop
20 8 12 0.4
20 9 11 0.45
20 8 12 0.4
20 8 12 0.4
20 7 13 0.35
20 16 4 0.8

We can let R count the number of times each value from 0 to 20 is produced by our 10000 simulations and present the information to us in a table or as a histogram

tally( ~ heads, data = Flips20) %>% pander()
Table continues below
2 3 4 5 6 7 8 9 10 11 12 13 14
2 10 42 136 375 756 1200 1597 1738 1581 1202 791 381
15 16 17 18 19
139 41 6 2 1
gf_histogram( ~ heads, data = Flips20, binwidth = 1)

Perhaps we would rather have proportions rather than counts.

tally( ~ heads, data = Flips20, format = "prop") %>% pander()
Table continues below
2 3 4 5 6 7 8 9 10
2e-04 0.001 0.0042 0.0136 0.0375 0.0756 0.12 0.1597 0.1738
11 12 13 14 15 16 17 18 19
0.1581 0.1202 0.0791 0.0381 0.0139 0.0041 6e-04 2e-04 1e-04
gf_histogram( ~ heads, data = Flips20, binwidth = 1)

The table above allows to to calculate probabilities about the number of heads in 20 tosses of a fair coin.


Example 2

Toss a biased coin 10 times and let \(X\) = number of heads.
Assume the biased coin has a probability of 0.25 of producing a head.

We simulate 10 tosses of the biased coin.

rflip(10, prob = 0.25)
## 
## Flipping 10 coins [ Prob(Heads) = 0.25 ] ...
## 
## T T T T T H T T T H
## 
## Number of Heads: 2 [Proportion Heads: 0.2]

Now let’s repeat that 10,000 times, find the estimated probability distribution for \(X\), and make a histogram.

Flips10 <- do(10000) * rflip(10, prob=.25)
tally( ~ heads, data = Flips10, format ="prop") %>% pander()
0 1 2 3 4 5 6 7 8
0.0589 0.1814 0.2894 0.2501 0.1417 0.0582 0.017 0.0028 5e-04
gf_dhistogram( ~ heads, data = Flips10, binwidth = 1)

Q. What is (approximately) the probability of getting at least 5 heads?

We could answer this by doing the arithmetic ourselves from the table above, but we should let R do that for us. Here’s how.

tally( ~(heads >= 5), data = Flips10)
## (heads >= 5)
##  TRUE FALSE 
##   785  9215

Your Turn

What is the probability of getting 3 or fewer heads?


Example 3

We can use coin tosses to simulate anything that has two possible outcomes. Here’s an example about shooting free throws.

Jo has a free throw shooting average of 85%. She has been fouled in the act of shooting a 3-point shot as time runs out. Jo has 3 free throws coming. Her team is behind by 2 points. Estimate the probability that her team does not lose in regulation time. Use 10000 simulations.

FreeThrows <- do(10000) * rflip(3, prob = 0.85)
tally( ~(heads >= 2), data = FreeThrows, format = "prop") 
## (heads >= 2)
##   TRUE  FALSE 
## 0.9417 0.0583

Rolling dice

Example 4

Roll of a pair of fair dice. Let \(S\) = sum of the results.

1:6
## [1] 1 2 3 4 5 6

We can generate the result of tossing a fair die 10 times

resample(1:6, 10)
##  [1] 1 1 4 4 2 5 3 1 2 4

We now simulate 10000 tosses of a fair die twice.2

TwoDice <-
  tibble(
    die1 = resample(1:6, 10000),
    die2 = resample(1:6, 10000),
    Sum = die1 + die2)                # add them together, componentwise
head(TwoDice) %>% pander()
die1 die2 Sum
2 2 4
3 2 5
6 4 10
6 3 9
2 3 5
5 2 7

We create the estimated probability distribution for \(S\) and the histogram.

tally( ~ Sum, data = TwoDice, format = "prop") %>% pander()
Table continues below
2 3 4 5 6 7 8 9 10
0.0261 0.0581 0.0846 0.1109 0.1395 0.1655 0.137 0.1121 0.0816
11 12
0.057 0.0276
gf_dhistogram( ~ Sum, data = TwoDice, binwidth = 1)

Your turn

1. A pair of fair dice are tossed. Use 10000 simulations to estimate the probability that the dice turn up with the same value.

tally( ~(die1 == die2), data = TwoDice, format = "prop") %>% pander()
TRUE FALSE
0.1616 0.8384

2. Estimate the probability that the sum of two dice is more than 9.

tally( ~(Sum > 9), data = TwoDice, format = "prop")
## (Sum > 9)
##   TRUE  FALSE 
## 0.1662 0.8338

Dealing cards

An ordinary deck of playing cards contains 52 cards with 4 suits (clubs, diamonds, hearts, spades) of 13 cards each (2 – 10, Jack, Queen, King, and Ace). The clubs and spades are black suits; hearts and diamonds are red.

We can simulate a deck of cards using the numbers 1:52. (The mosaic package includes a slightly fancier thing called Cards, but it is harder to do arithmetic with that.)

For example, here are two 5-card hands.

set.seed(1234)          # so we get the same random draw each time we run this
sample(1:52, 5)
## [1] 28 16 22 37 44
sample(Cards, 5)        # 9S = 9 of spades; etc.
## [1] "9S"  "10C" "6C"  "KH"  "4D"

Note that we are using sample() here rather than resample().
resample() allows for a value to be repeated (“sampling with replacement). But with cards, you can only get each card once.
sample() does not allow for repetition (”samping without replacement"). This ensures that we get 5 different cards each time.


Example 5

If we deal 3 cards from a standard deck, what is the probability that all three are spades?

Let’s represent spades as the cards 1–13. So the questions becomes what is the probability of getting three numbers in that range. R will create TRUE and FALSE values if we do a comparison, and sum() will treat TRUE like 1 and FALSE like 0. Putting that all together, it is pretty easy to count how many spades are in a hand of three cards.

hand1 <- sample(1:52, 3) 
hand1
## [1]  4 34 39
hand1 <= 13
## [1]  TRUE FALSE FALSE
sum(hand1 <= 13)
## [1] 1

We can put this altogether in one line. That will make easier to repeat things.3

sum(sample(1:52, 3) <= 13)
## [1] 0
sum(sample(1:52, 3) <= 13)
## [1] 2
sum(sample(1:52, 3) <= 13)
## [1] 0

We just need to do that lots of times.

set.seed(123456)
HandsOfThree <- do(10000) * sum(resample(1:52, 3) <= 13)
head(HandsOfThree) %>% pander()
sum
1
2
1
1
1
1

Finally, we can use tally() to find the proportions we are looking for.

tally( ~ sum, data = HandsOfThree, format = "prop")
## sum
##      0      1      2      3 
## 0.4292 0.4192 0.1375 0.0141
gf_dhistogram( ~ sum, data = HandsOfThree, binwidth = 1)

A Statistical Question

The simulation method only provides an estimate of the probability. If you recreate the examples abovce with a different random draw (either don’t use set.seed() or use a different number in set.seed()), you will get somewhat different results. We would like to know things like

These sorts of quetsion arise whenever we try to estimate something and form one of the big ideas in statistics. Stay tuned for answers.

Exercises

1. Coins
a. A fair coin is tossed 8 times. Use 10,000 simulations to estimate the probability distribution for the random variable X = number of heads.
b. Using your result in (a), estimate the probability that at least 5 heads are produced. c. Repeat (a) and (b) for a biased coin where the probability of a head is .66.

2. Dice Three fair dice are tossed. Let X be the sum of the values produced.

  1. Use 10,000 simulations to estimate the probability distribution of X.
  2. Use your result in (a) to estimate the probability that the sum is less than 10.

3. Softball Carol has a batting average in softball of .440. (This means she gets a hit 44% of the time.) If she has 5 at bats in a game, what is the probability that she gets exactly 2 hits? What is the probability that she gets at least 2 hits? Simulate her performance in 10000 games to estimate these probabilities.

4. Cards Use 10000 simulations to estimate the probability that in a five card hand, at least four of the cards are red (either a heart or a diamond). Include the R-commands you used and the outputs.


  1. The pander() function is in the pander package. It creates prettier output for things like tables and data frames.↩︎

  2. The tibble() function creates data frames. You simply specify the values for each of the variables in the data frame. In this case, we are creating die1 and die2 randomly, and then creating S by adding those two values together.↩︎

  3. If you want to repeat multiple lines of code, you just need to surround them in curly braces { }. So it isn’t really that hard.
    do(3) * {
      hand1 <- sample(1:52, 3) 
      hand1
      hand1 <= 13
      sum(hand1 <= 13)
    }
    ##   result
    ## 1      0
    ## 2      1
    ## 3      1
    ↩︎