In this tutorial, I’ll cover how to analyze repeated-measures designs using 1) multilevel modeling using the lme package and 2) using Wilcox’s Robust Statistics package (see Wilcox, 2012). In a repeated-measures design, each participant provides data at multiple time points. Due to this, the assumptions about model error are different for variances which are presented between subjects (i.e., SSB than are variables presented within subjects (i.e., SSW. After the within-subject variability is partialled out, we model separately the effect of the experiment (i.e., SSE and the error not account for by the experiment (i.e., SSR).
When using this tutorial, there are a few things to keep in mind:
This is a draft. I’ll be updating this page with more graphs and explanations as time allows, informed by your feedback.
Multilevel models and Robust ANOVAs are just a few of the ways that repeated-measures designs can be analyzed. I’ll be presenting the multilevel approach using the nlme package because assumptions about sphericity are different and are less of a concern under this approach (see Field et al., 2012, p. 576).
One way repeated measures
The first dataset we’ll be using can be obtained from the Personality Project:
Reading the dataset
12
data <- read.table(file ="http://personality-project.org/r/datasets/R.appendix3.data", header =T)
The main research question is does the valence of the word affect the rate at which items are recalled? First, let’s take a look at descriptive statistics of the dataset. We can sort them by the item valence using the describeBy() function in the psych package, which is available on CRAN.
Generating descriptive statistics by a group
12345
# Load the psych packagelibrary(psych)# Describe the Recall variable by the Valence variabledescribeBy(data$Recall, group = data$Valence)
Generating descriptive statistics by a group
1234567891011
## group: Neg## var n mean sd median trimmed mad min max range skew kurtosis se## 1 1 5 27.8 3.9 29 27.8 4.45 22 32 10 -0.39 -1.72 1.74## -------------------------------------------------------- ## group: Neu## var n mean sd median trimmed mad min max range skew kurtosis se## 1 1 5 11.6 2.7 12 11.6 2.97 8 15 7 -0.09 -1.83 1.21## -------------------------------------------------------- ## group: Pos## var n mean sd median trimmed mad min max range skew kurtosis se## 1 1 5 40 3.81 40 40 2.97 35 45 10 0 -1.78 1.7
Graphing the Effect
We can generate a quick boxplot to display the effect of Valence on Recall using the ggplot2 pacakge from CRAN.
A multilevel model is simply a regression that allows for the errors to be dependent on eachother (as our conditions of Valence were repeated within each participant). To run this type of analysis, we’ll use the nlme package from CRAN, although I’ve also had good luck with the lme4 package if you like experimenting.
Installing nlme
1
install.packages('nlme')
Similar to any approach to model testing, we want to see if our predictive, augmented model is better than a simple, 1 parameter mean model. Thus, we begin by specifying a baseline model in which the DV, Recall, is predicted by its overall mean. Second, we specify our model of interest, in which Recall is predicted instead by a the item Valence, which was repeated within subjects.
Specifying our 2 Models
123456789
# Load the nlme packagelibrary(nlme)# Compact Modelbaseline <- lme(Recall ~1, random =~1| Subject/Valence, data = data, method ="ML")# Augmented ModelvalenceModel <- lme(Recall ~ Valence, random =~1| Subject/Valence, data = data, method ="ML")
One way of assessing the significance of our model is by comparing it from the baseline model. By comparing the models, we ask whether Valence as a predictor is significantly better than the simple mean model (i.e., a better fit). We can do this with the anova() function.
Comparing the Models
1
anova(baseline, valenceModel)
Comparing the Models
123
## Model df AIC BIC logLik Test L.Ratio p-value## baseline 1 4 125.24 128.07 -58.62 ## valenceModel 2 6 84.36 88.61 -36.18 1 vs 2 44.87 <.0001
The output contains a few indicators of model fit. Generally with AIC (i.e., Akaike information criterion) and BIC (i.e., Bayesian information criterion), the lower the number the better the model, as it implies either a more parsimonious model, a better fit, or both. The likelihood ratio indicates that our valenceModel is a significnatly better fit for the data than our baseline model (p < 0.0001). Therefore, the item Valence had a significant impact on the measured Recall of the participant, Χ2(2) = 44.87, p < 0.0001.
We can obtain more specific details about the model using the summary() function:
## ## Simultaneous Tests for General Linear Hypotheses## ## Multiple Comparisons of Means: Tukey Contrasts## ## ## Fit: lme.formula(fixed = Recall ~ Valence, data = data, random = ~1 | ## Subject/Valence, method = "ML")## ## Linear Hypotheses:## Estimate Std. Error z value Pr(>|z|) ## Neu - Neg == 0 -16.20 1.31 -12.36 <2e-16 ***## Pos - Neg == 0 12.20 1.31 9.31 <2e-16 ***## Pos - Neu == 0 28.40 1.31 21.67 <2e-16 ***## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1## (Adjusted p values reported -- single-step method)
Thus, our post hoc analysis shows that participants’ rate of recall was significantly better for positively valenced items (M = 40) than neutral (M = 11.6, b = 28.40, p < .0001) and negatively valenced items (M = 27.8, b = 12.20, p < .0001). Similarly, neutral items were recalled at a significantly higher rate than negatively valenced items (b = -16.20, p < .0001).
Wilcox’s Robust ANOVA
As of 5/1/13, the WRS package must be compiled from source to be installed. You can obtain the source package from the R-Forge repo below:
Unlike using lme() to analyze the data as a multilevel model, rmanova() requires that the data are in wide format. To adjust our table, we’ll use the reshape2 package from CRAN and cast the data into a wide format.
For some reason, the rmanova() function doesn’t like dealing with factors variables, so we’ll remove the 5 Subjects. Finally, we’ll use rmanova(), which trims the data by 20% before estimating the effect.
Similar to our findings from above, Valence had a significant influence on the item recall rate of the participant, F(1.26, 2.52) = 154.66, p < .01. However, we still want to conduct post-hoc analysis on the 20% trimmed means, which we’ll do using the rmmcp() function.
Post-hoc analysis confirms that Negatively valenced items are significantly different from both Neutral (Ψ̂ = 16, p < .01) and Positive items (Ψ̂ = -13, p < .05). Additionally, Neutral items are significantly different from positive items (Ψ̂ = -28.33, p < .01).
Two way repeated measures ( In Progress)
The second dataset we’ll be using can be obtained from the Personality Project:
Model df AIC BIC logLik Test L.Ratio p-value
baseline 15151.9240158.9300-70.96201valenceModel 27153.4414163.2498-69.720691 vs 22.4826320.2890taskModel 38145.7924157.0020-64.896212 vs 39.6489590.0019fullModel 410149.2138163.2258-64.606923 vs 40.5785840.7488
References
Field, A., Miles, J., & Field, Z. (2012). Discovering Statistics Using R. SAGE Publications.
Wilcox, R. R. (2012). Introduction to robust estimate and hypothesis testing.