I’ve created the following code that nests a for loop inside of a for loop in R. It is a simulation to calculate Power. I’ve read that R isn’t great for doing for loops but I was wondering if there are any efficiencies I could apply to make this run a bit faster. I’m fairly new to R as well as programming of any sort. Right now the run times I’m seeing are:
m=10 I get .17 sec
m=100 I get 3.95 sec
m=1000 I get 246.26 sec
m=2000 I get 1003.55 sec
I was hoping to set the number of times to sample, m, upwards of 100K but I’m afraid to even set this at 10K
Here is the code:
m = 1000 # number of times we are going to take samples
popmean=120 # set population mean at 120
popvar=225 # set known/established population
variance at 225
newvar=144 # variance of new methodology
alpha=.01 # set alpha
teststatvect = matrix(nrow=m,ncol=1) # empty vector to populate with test statistics
power = matrix(nrow=200,ncol=1) # empty vector to populate with power
system.time( # not needed - using to gauge how long this takes
for (n in 1:length(power)) # begin for loop for different sample sizes
for(i in 1:m){ # begin for loop to take "m" samples
y=rnorm(n,popmean,sqrt(newvar)) # sample of size n with mean 120 and var=144
ts=sum((y-popmean)^2/popvar) # calculate test statistic for each sample
teststatvect[i]=ts # loop and populate the vector to hold test statistics
vecpvals=pchisq(teststatvect,n) # calculate the pval of each statistic
power[n]=length(which(vecpvals<=alpha))/length(vecpvals) # loop to populate power vector. Power is the proportion lessthan ot equal to alpha
}
}
)
I reorganized your code a bit and got rid of the inner loop.
replicate, as suggested in another answer, is nice for readability, but in this case you can do better by sampling random numbers in a block)colSumsis faster than summing inside aforloop or usingapply.mean(pvals<=alpha)in place ofsum(pvals<=alpha)/length(alpha)sapplyto range over the vector of sizes (not faster than aforloop, but cleaner and maybe easier to generalize).Code:
Do you really need the power for every integer value of sample size, or would a more widely spaced sample be OK (if you need exact values, interpolation would probably be pretty accurate)
This is reasonably fast and might get you up to
m=1e5if you needed, but I’m not quite sure why you need results that are that precise — the power curve is reasonably smooth withm=5000…If you’re impatiently waiting for long simulations, you can also get a progress bar to print by replacing
sapply(ssizevec,powfun,m=5000)withlibrary(plyr); aaply(ssizevec,.margins=1,powfun,.progress="text",m=5000)Finally, I think you can speed the whole up a lot by sampling chi-squared values directly, or by doing an analytical power calculation (!). I think that
rchisq(m,df=ssize)*newvar/popvaris equivalent to the first two lines of the loop, and you might even be able to do a numerical computation on the chi-squared densities directly …(I just tried this out, sampling
m=1e5at every value of sample size from 1 to 200 … it takes 24 seconds … but I still think it might be unnecessary.)A picture: