3 R Finger Food
No matter what language you work in, there are some thing you will need to look up from time to time. Even package authors looks at the help pages for things that they wrote!
But there are also things that you should be comfortable with and able to do without looking anything up, there just “in your fingers.”
This chapter includes some things that you might consider making “finger food.” In any case, they are useful things to know about, in case you ever need to look them up.
3.1 reorder()
There are many reasons you might want to put the levels of a categorical variable in a particular order. Sometimes you just need to manually code that. But sometimes the order is based on a cacluation:
- order groups by the mean of some value,
- order the bars in a plot by their length,
- etc.
This is just what reorder()
does for you.
3.1.1 A few examples
library(plotly)
library(mosaic)
|>
CPS85 plot_ly() |>
add_boxplot(y = ~ wage, x = ~ sector)
|>
CPS85 mutate(sector = reorder(sector, wage, max)) |>
plot_ly() |>
add_boxplot(y = ~ wage, x = ~ sector)
|>
CPS85 plot_ly() |>
add_histogram(x = ~ sector)
|>
CPS85 mutate(sector = reorder(sector, sector, length)) |>
plot_ly() |>
add_histogram(x = ~ sector)
|>
CPS85 mutate(sector = reorder(sector, sector, function(x) - length(x))) |>
plot_ly() |>
add_histogram(x = ~ sector)
3.1.2 How it works
reorder()
takes three arguments: two vectors and a function FUN
.
- The first vector will be converted to a factor. Each unique value of this vector will be a level of the factor.
- The second vector provides auxiliary information used to create the ordering. This vector should be the same length as the first vector.
- The two vectors are grouped according the unique values of the first vector.
FUN()
should take a vector values in and return a single number. This function is applied to each group of the second vector.- The levels are orderd according to values of
FUN()
. - Additional arguments can be passed in and become additional argument to the function.
3.1.3 One more example, using additional arguments
# use trimmed mean and avoid problems if there are missing values
|>
CPS85 mutate(sector = reorder(sector, wage, mean, trim = 0.10, na.rm = TRUE)) |>
plot_ly() |>
add_boxplot(y = ~ wage, x = ~ sector) |>
add_markers(x = ~sector, y = ~trimmed_mean_wage, color = ~ I("yellow"),
data = CPS85 |> group_by(sector) |> summarise(trimmed_mean_wage = mean(wage, trim = 0.10))
)