What’s in the bag?

There are five bags. Each one has four milk jug lids that are either blue or green. Each bag has a different number of blue lids (0, 1, 2, 3, or 4). If we randomly select a bag and then sample (with replacement) from the bag, what can we say about which bag we selected?

We won’t know for sure which bag it is, but some bags will be more likely than others. Can we quantify this?

Survey says…

Well not a survey really, but what do the data say?

Suppose we draw one lid from the bag and it is Green. Which bag is most likely? Which is least likely?

draw_garden() + xlim(0, 1.1)

Let’s get some more data

Let’s draw two more lids (replacing the previously drawn lid and mixing the bag thoroughly each time).

Now our data is: GGG

A little portion of the garden

Let’s consider for a moment just the bag that has 1 Blue (and 3 Greens). How many ways are there to get GGG?
Here’s a map through the garden of forking paths for this bag.

draw_garden(bags = 1)

But not all paths lead to GGG. Let’s highlight just that ones that do.

draw_garden(bags = 1, pattern = lids)

Counting up we see there are 3 * 3 * 3 = 27 such ways.

A view of the whole garden

Here’s the whole garden.

draw_garden(bags = 0:4)

Again, let’s highlight just the paths that lead to GGG.

draw_garden(bags = 0:4, pattern = lids)

It’s not too hard to count up the number of paths that lead to GGG, but before we do that, let’s come up with a bookkeeping system that keeps us organized.

Another View

As we collect more data, it is soon going to be impossible to create maps of the whole garden and to count up possible ways of navigating the map to get to the data. We need some improved methods. The key insight is that we obtain the total number of ways for a given bag by multiplying the number of ways to do each step.

ways("GGG")

##   bag prior     terms ways prob  probf
## 1   0     1 4 * 4 * 4   64 0.64 64/100
## 2   1     1 3 * 3 * 3   27 0.27 27/100
## 3   2     1 2 * 2 * 2    8 0.08  8/100
## 4   3     1 1 * 1 * 1    1 0.01  1/100
## 5   4     1 0 * 0 * 0    0 0.00  0/100

Another bit of data

Let draw another lid. Now our data are GGGB. We can use the same bookkeeping to tally up the ways.

ways("GGGB")

##   bag prior         terms ways       prob probf
## 1   0     1 4 * 4 * 4 * 0    0 0.00000000  0/46
## 2   1     1 3 * 3 * 3 * 1   27 0.58695652 27/46
## 3   2     1 2 * 2 * 2 * 2   16 0.34782609 16/46
## 4   3     1 1 * 1 * 1 * 3    3 0.06521739  3/46
## 5   4     1 0 * 0 * 0 * 4    0 0.00000000  0/46

But we don’t have to start all over, we could also start from what we knew when we had only seen GGG:

ways("GGG")

##   bag prior     terms ways prob  probf
## 1   0     1 4 * 4 * 4   64 0.64 64/100
## 2   1     1 3 * 3 * 3   27 0.27 27/100
## 3   2     1 2 * 2 * 2    8 0.08  8/100
## 4   3     1 1 * 1 * 1    1 0.01  1/100
## 5   4     1 0 * 0 * 0    0 0.00  0/100

ways("B", prior = ways("GGG")$ways)

##   bag prior terms ways       prob probf
## 1   0    64     0    0 0.00000000  0/46
## 2   1    27     1   27 0.58695652 27/46
## 3   2     8     2   16 0.34782609 16/46
## 4   3     1     3    3 0.06521739  3/46
## 5   4     0     4    0 0.00000000  0/46

This means

We can compute the total number of ways (or probabilities) without creating all of the paths.
We can update to include new data by taking the informationt we had before as prior information and and starting from there.
We can hanlde (or get a computer to handle for us) more than just a small number of bags (stay tuned).

Also, it is only the relative number of ways that matters. We can convert to probability by dividing by the total number of ways.

Here is a grphical view of how our probabilities for each bag are updated as we see each bit of data:

Garden of Forking Paths

Statistical Rethinking, Chapter 2

January 30, 2017