Skip to contents

Flights categorized by destination city, airline, and whether or not the flight was on time.

Format

A data frame with 11000 observations on the following 3 variables.

airport

a factor with levels LosAngeles, Phoenix, SanDiego, SanFrancisco, Seattle

result

a factor with levels Delayed, OnTime

airline

a factor with levels Alaska, AmericaWest

Source

Barnett, Arnold. 1994. ``How numbers can trick you.'' Technology Review, vol. 97, no. 7, pp. 38--45.

References

These and similar data appear in many text books under the topic of Simpson's paradox.

Examples


tally(
  airline ~ result, data = AirlineArrival, 
  format = "perc", margins = TRUE)
#>              result
#> airline         Delayed    OnTime
#>   Alaska       38.89752  33.71087
#>   AmericaWest  61.10248  66.28913
#>   Total       100.00000 100.00000
tally(
  result ~ airline + airport, 
  data = AirlineArrival, format = "perc", margins = TRUE)
#> , , airport = LosAngeles
#> 
#>          airline
#> result        Alaska AmericaWest
#>   Delayed  11.091234   14.426634
#>   OnTime   88.908766   85.573366
#>   Total   100.000000  100.000000
#> 
#> , , airport = Phoenix
#> 
#>          airline
#> result        Alaska AmericaWest
#>   Delayed   5.150215    7.897241
#>   OnTime   94.849785   92.102759
#>   Total   100.000000  100.000000
#> 
#> , , airport = SanDiego
#> 
#>          airline
#> result        Alaska AmericaWest
#>   Delayed   8.620690   14.508929
#>   OnTime   91.379310   85.491071
#>   Total   100.000000  100.000000
#> 
#> , , airport = SanFrancisco
#> 
#>          airline
#> result        Alaska AmericaWest
#>   Delayed  16.859504   28.730512
#>   OnTime   83.140496   71.269488
#>   Total   100.000000  100.000000
#> 
#> , , airport = Seattle
#> 
#>          airline
#> result        Alaska AmericaWest
#>   Delayed  14.212488   23.282443
#>   OnTime   85.787512   76.717557
#>   Total   100.000000  100.000000
#> 
AirlineArrival2 <- 
  AirlineArrival %>% 
  group_by(airport, airline, result) %>% 
  summarise(count = n()) %>%
  group_by(airport, airline) %>%
  mutate(total = sum(count), percent = count/total * 100) %>% 
  filter(result == "Delayed") 
#> `summarise()` has grouped output by 'airport', 'airline'. You can override
#> using the `.groups` argument.
AirlineArrival3 <- 
  AirlineArrival %>% 
  group_by(airline, result) %>% 
  summarise(count = n()) %>%
  group_by(airline) %>%
  mutate(total = sum(count), percent = count/total * 100) %>% 
  filter(result == "Delayed") 
#> `summarise()` has grouped output by 'airline'. You can override using the
#> `.groups` argument.
  gf_line(percent ~ airport, color = ~ airline, group = ~ airline, 
          data = AirlineArrival2) %>%
    gf_point(percent ~ airport, color = ~ airline, size = ~total, 
             data = AirlineArrival2) %>%
    gf_hline(yintercept = ~ percent, color = ~airline, 
             data = AirlineArrival3, linetype = "dashed") %>%
    gf_labs(y = "percent delayed")