I have a working solution to my problem, but I will not be able

Question

0

Asked: June 13, 20262026-06-13T03:56:13+00:00 2026-06-13T03:56:13+00:00

I have a working solution to my problem, but I will not be able

0

I have a working solution to my problem, but I will not be able to use it because it is so slow (my calculations predict that the whole simulation will take 2-3 years!). Thus I am looking for a better (faster) solution. This is (in essence) the code I am working with:

N=4
x <-NULL
for (i in 1:N) { #first loop
  v <-sample(0:1, 1000000, 1/2) #generate data
  v <-as.data.frame(v) #convert to dataframe
  v$t <-rep(1:2, each=250) #group
  v$p <-rep(1:2000, each=500) #p.number
  # second loop
  for (j in 1:2000) { #second loop
    #count rle for group 1 for each pnumber
    x <- rbind(x, table(rle(v$v[v$t==1&v$p==j])))
    #count rle for group 2 for each pnumber
    x <- rbind(x, table(rle(v$v[v$t==2&v$p==j])))
  } #end second loop
} #end first loop
#total rle counts for both group 1 & 2
y <-aggregate(x, list(as.numeric(rownames(x))), sum)

In words: The code generates a coin-flip simulation (v). A group factor is generated (1 & 2). A p.number factor is generated (1:2000). The run lengths are recorded for each p.number (1:2000) for both groups 1 & group 2 (each p.number has runs in both groups). After N loops (the first loop), the total run lengths are presented as a table (aggregate) (that is, the run lengths for each group, for each p.number, over N loops as a total).

I need the first loop because the data that I am working with comes in individual files (so I’m loading the file, calculating various statistics etc and then loading the next file and doing the same). I am much less attached to the second loop, but can’t figure out how to replace it with something faster.

What can be done to the second loop to make it (hopefully, a lot) faster?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-13T03:56:14+00:00

If you just want to run rle and table for each combination of the values of v$t and v$p separately, there is no need for the second loop. It is much faster in this way:

values <- v$v + v$t * 10 + v$p * 100
runlength <- rle(values)
runlength$values <- runlength$values %% 2
x <- table(runlength)


y <- aggregate(unclass(x), list(as.numeric(rownames(x))), sum)

The whole code will look like this. If N is as low as 4, the growing object x will not be a severe problem. But generally I agree with @GavinSimpson, that it is not a good programming technique.

N=4
x <-NULL
for (i in 1:N) { #first loop
  v <-sample(0:1, 1000000, 1/2) #generate data
  v <-as.data.frame(v) #convert to dataframe
  v$t <-rep(1:2, each=250) #group
  v$p <-rep(1:2000, each=500) #p.number

  values <- v$v + N * 10 + v$t * 100 + v$p * 1000
  runlength <- rle(values)
  runlength$values <- runlength$values %% 2
  x <- rbind(x, table(runlength))

} #end first loop
y <-aggregate(x, list(as.numeric(rownames(x))), sum) #tota

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a working solution to my problem, but I will not be able

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply