Using the data frame below, I managed to compute a repeated-measures ANOVA for subject reaction time. Here is the data frame in question:
> str(a)
'data.frame': 2778 obs. of 9 variables:
$ Phase : Factor w/ 1 level "Test": 1 1 1 1 1 1 1 1 1 1 ...
$ Subject : Factor w/ 17 levels "1","2","3","5",..: 7 7 7 7 7 7 7 7 7 7 ...
$ Group : Factor w/ 2 levels "Attn","Dist": 1 1 1 1 1 1 1 1 1 1 ...
$ Global : Factor w/ 2 levels "D","S": 1 1 1 1 1 1 1 1 1 1 ...
$ Local : Factor w/ 2 levels "D","S": 1 1 1 1 1 1 1 1 1 1 ...
$ trialtype: Factor w/ 1 level "Dist": 1 1 1 1 1 1 1 1 1 1 ...
$ RT : num 477 682 720 NaN 604 720 910 707 705 758 ...
$ ACC : logi TRUE TRUE TRUE FALSE TRUE TRUE ...
And here is the code I used to compute the ANOVA for the reaction times:
raw<-read.table('R_notarg_noattn.tdf',header=T)
head(raw)
str(raw)
raw$Subject = factor(raw$Subject)
raw$logrt = log10(raw$RT) # logorithm of RT
hist(raw$logrt)
tsttrl_nooutliers = subset(raw, logrt>2 & ACC==TRUE) # take values greater than 2 logs AND where subj responded correctly
attach(tsttrl_nooutliers) # make column names available as global variables
hist(logrt)
summary(aovrt <- aov(logrt ~ Group*Global*Local + Error(Subject/(Global*Local)), subset=Phase=='Test', data=tsttrl_nooutliers)) # ANOVA table
meanrt=10^tapply(logrt,list( Global=Global, Local=Local, Group=Group), mean) # de-log and calculate means by condition
par(mfcol=c(1,2)) # c() *combines* values into vector/list; par() sets graphical parameters... equivalent to Matlab's set() ????
barplot(meanrt[,,'Attn'],beside=T,ylim=c(700,1000),xpd=F)
barplot(meanrt[,,'Dist'],beside=T,ylim=c(700,1000),xpd=F)
detach(tsttrl_nooutliers)
I’d like to repeat a similar analysis on the error rates, which are coded in the boolean column ACC. I was wondering how I should go about doing this, since this computation requires the intermediate step of calculating error rates by subject per condition. When I say “condition”, I mean to say the unique combination of factors, i.e. $Group, $Global, $Local, $trialtype (selecting only trials where $Phase == Test, as in the previous snippet).
Could anybody point me in the right direction? In short, I’m unclear on how to obtain the error rates, which I then should have no problem feeding into the aov function.
I disagree that the use of aov() would impose no problems, since you would be moving from analyzing an outcome that is continuous to one that is discrete (multiply observed binomial). Leaving aside the fact that this would normally require Poisson regression or logistic regression, it is possible to aggregate the sum/length of “ACC” within categories of $Subject, $Group, $Global, $Local, and $trialtype. At the moment there is only one level to $trialtype and $Phase, so subsetting would appear unneeded, but if this str() output is on a subset then you could restrict it to those with $trialtype==”Test” by only using
a[a$trialtyp=="Test" , ]as your dataframe.Edit 1: You may want to seek statistical consultation regarding how to approach this study design for a discrete outcome at http://www.stackexchange.com. You may not even need this step if you set up your glm() or lmer() analysis properly. I would have been thinking that I might try $Subject as a level in a mixed model having ACC==TRUE as the outcome with Poisson errors using an offset of log(length(ACC)).
Edit 2: It is possible that the original approach would be sufficient if you had sufficient numbers in each subject and category that the error rates were “pseudo-continuous”, i.e were not largely zeros.