Randy Pruim
eCOTS 2014
A lot of times you end up putting in a lot more volume, because you are teaching fundamentals and you are teaching concepts that you need to put in, but you may not necessarily use because they are building blocks for other concepts and variations that will come off of that … In the offseason you have a chance to take a step back and tailor it more specifically towards your team and towards your players.“
Mike McCarthy, Head Coach, Green Bay Packers |
Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away.
— Antoine de Saint-Exupery (writer, poet, pioneering aviator) |
One key to successfully introducing R is finding a set of commands that is
It is not enough to use R, it must be used elegantly.
The mosaic package offers one way to do this.
goal( ~ x, data = mydata )
goal( y ~ x | z , data = mydata )
goal( formula , data = mydata )
The data: HELPrct
Variables: age
, substance
Command: bwplot()
Raise your hand when you have created this plot
bwplot( age ~ substance, data=HELPrct)
Raise your hand when you have created this plot.
bwplot( substance ~ age, data=HELPrct )
histogram( ~ age, data=HELPrct)
Note: When there is one variable it is on the right side of the formula.
histogram( ~age, data=HELPrct )
densityplot( ~age, data=HELPrct )
bwplot( ~age, data=HELPrct )
qqmath( ~age, data=HELPrct )
freqpolygon( ~age, data=HELPrct )
bargraph( ~sex, data=HELPrct )
xyplot( i1 ~ age, data=HELPrct )
bwplot( age ~ substance, data=HELPrct )
bwplot( substance ~ age, data=HELPrct )
histogram()
, qqmath()
, densityplot()
, freqpolygon()
, bargraph()
xyplot()
, bwplot()
Create a plot of your own choosing with one of these data sets
names(KidsFeet) # 4th graders' feet
?KidsFeet
names(Utilities) # utility bill data
?Utilities
names(NHANES) # body shape, etc.
?NHANES
Raise your hand when you have made a plot or two.
Type a question if you have trouble.
groups =
group to overlay.y ~ x | z
to create multipanel plots.densityplot( ~ age | sex, data=HELPrct,
groups=substance,
auto.key=TRUE)
My approach:
Big idea:
histogram( ~ age, data=HELPrct )
mean( ~ age, data=HELPrct )
[1] 35.65
The mosaic package includes formula aware versions of
mean()
,
sd()
,
var()
,
min()
,
max()
,
sum()
,
IQR()
, …
Also provides favstats()
to compute our favorites.
favstats( ~ age, data=HELPrct )
min Q1 median Q3 max mean sd n missing
19 30 35 40 60 35.65 7.71 453 0
tally( ~ sex, data=HELPrct)
female male
107 346
tally( ~ substance, data=HELPrct)
alcohol cocaine heroin
177 152 124
Three ways to think about this. All do the same thing.
sd( age ~ substance, data=HELPrct )
sd( ~ age | substance, data=HELPrct )
sd( ~ age, groups=substance, data=HELPrct )
alcohol cocaine heroin
7.652 6.693 7.986
tally( sex ~ substance, data=HELPrct )
substance
sex alcohol cocaine heroin
female 0.2034 0.2697 0.2419
male 0.7966 0.7303 0.7581
tally( ~ sex + substance, data=HELPrct )
substance
sex alcohol cocaine heroin
female 36 41 30
male 141 111 94
mean( age ~ substance | sex, data=HELPrct, )
A.F C.F H.F A.M C.M H.M F M
39.17 34.85 34.67 37.95 34.36 33.05 36.25 35.47
median()
, min()
, max()
, sd()
, var()
, favstats()
, etc. mean( age ~ sex, data=HELPrct )
bwplot( age ~ sex, data=HELPrct )
lm( age ~ sex, data=HELPrct )
female male
36.25 35.47
(Intercept) sexmale
36.2523 -0.7841
The mosaic
package includes some other things, too
xchisq.test()
, xpnorm()
, xqqmath()
mPlot()
– interactive plot designhistogram()
controls (e.g., width
)xpnorm( 700, mean=500, sd=100)
If X ~ N(500,100), then
P(X <= 700) = P(Z <= 2) = 0.9772
P(X > 700) = P(Z > 2) = 0.0228
[1] 0.9772
xpnorm( c(300, 700), mean=500, sd=100)
If X ~ N(500,100), then
P(X <= 300) = P(Z <= -2) = 0.0228
P(X <= 700) = P(Z <= 2) = 0.9772
P(X > 300) = P(Z > -2) = 0.9772
P(X > 700) = P(Z > 2) = 0.0228
[1] 0.02275 0.97725
xchisq.test(phs)
Pearson's Chi-squared test with Yates' continuity correction
data: phs
X-squared = 24.43, df = 1, p-value = 7.71e-07
104.00 10933.00
( 146.52) (10890.48)
[12.34] [ 0.17]
<-3.51> < 0.41>
189.00 10845.00
( 146.48) (10887.52)
[12.34] [ 0.17]
< 3.51> <-0.41>
key:
observed
(expected)
[contribution to X-squared]
<residual>
Modeling is really the starting point for the mosaic
design.
lm()
and glm()
) defined the template