I am running a simulation trying to find the probability of something taking place in a number of binomial trials. I start with specifying the data
iter=5000
data=data.frame(prob=runif(300), value=runif(300))
data<-data[sample(nrow(data), iter, replace=T),]
then I add the trials
cols <- c("one","two","three","four","five","six",
"seven","eight","nine","ten","eleven","twelve")
data[,cols] <- NA
one contains the results of only one binomial trials, two contains the results of two binomial trials and so on. If a binomial event takes place in any of the one, two, three, …, twelve, the cell is marked 1 else 0.
Then I run the trials for iter=5000 simulations
for (col in 3:14) {
for (i in 1:iter) if (sum(rbinom((col-2),1,data[i,1]))>0) data[i,col]<-1 else data[i,col]<-0
}
Then I evaluate the mean(data$value[data$one==0] till … mean(data$value[data$twelve==0]
My problem is that the simulation code takes forever for iter>15000.
for (col in 3:14) {
for (i in 1:iter)
data[i,col] <- if (sum(rbinom((col-2),1,data[i,1]))>0) 1 else 0
}
Any ideas?
For
iterof 16000, this runs in 2.29s on my machine, compared to an (estimated) 1781s for the ordering in your original algorithm. In general, don’t assign individual elements in the data frame when you can assign the whole column at once. There may be more improvements possible, but I’ll stop at >750x speedup (and changing the algorithm from running time of O(n^2) to O(n)).