3 R Finger Food

No matter what language you work in, there are some thing you will need to look up from time to time. Even package authors looks at the help pages for things that they wrote!

But there are also things that you should be comfortable with and able to do without looking anything up, there just “in your fingers.”

This chapter includes some things that you might consider making “finger food.” In any case, they are useful things to know about, in case you ever need to look them up.

3.1 reorder()

There are many reasons you might want to put the levels of a categorical variable in a particular order. Sometimes you just need to manually code that. But sometimes the order is based on a cacluation:

  • order groups by the mean of some value,
  • order the bars in a plot by their length,
  • etc.

This is just what reorder() does for you.

3.1.1 A few examples

library(plotly)
library(mosaic)
CPS85 |>
  plot_ly() |>
  add_boxplot(y = ~ wage, x = ~ sector) 
CPS85 |>
  mutate(sector = reorder(sector, wage, max)) |>
  plot_ly() |>
  add_boxplot(y = ~ wage, x = ~ sector)
CPS85 |>
  plot_ly() |>
  add_histogram(x = ~ sector)
CPS85 |>
  mutate(sector = reorder(sector, sector, length)) |>
  plot_ly() |>
  add_histogram(x = ~ sector)
CPS85 |>
  mutate(sector = reorder(sector, sector, function(x) - length(x))) |>
  plot_ly() |>
  add_histogram(x = ~ sector)

3.1.2 How it works

reorder() takes three arguments: two vectors and a function FUN.

  • The first vector will be converted to a factor. Each unique value of this vector will be a level of the factor.
  • The second vector provides auxiliary information used to create the ordering. This vector should be the same length as the first vector.
  • The two vectors are grouped according the unique values of the first vector.
  • FUN() should take a vector values in and return a single number. This function is applied to each group of the second vector.
  • The levels are orderd according to values of FUN().
  • Additional arguments can be passed in and become additional argument to the function.

3.1.3 One more example, using additional arguments

# use trimmed mean and avoid problems if there are missing values
CPS85 |>
  mutate(sector = reorder(sector, wage, mean, trim = 0.10, na.rm = TRUE)) |>
  plot_ly() |>
  add_boxplot(y = ~ wage, x = ~ sector) |>
  add_markers(x = ~sector, y = ~trimmed_mean_wage, color = ~ I("yellow"), 
              data = CPS85 |> group_by(sector) |> summarise(trimmed_mean_wage = mean(wage, trim = 0.10))
  )