I need to generate simulated data where the percent censored cannot be 0 or

Question

0

Asked: June 1, 20262026-06-01T23:04:49+00:00 2026-06-01T23:04:49+00:00

I need to generate simulated data where the percent censored cannot be 0 or

0

I need to generate simulated data where the percent censored cannot be 0 or 1. That’s why I use while loop. The problem is if I increase count to 10,000 (instead of 5), the program is very slow. I have to repeat this with 400 different scenarios so it is extremely slow. I’m trying to figure out places where I can vectorize my code piece by piece. How can I avoid while-loop and still able to keep the condition?

Another approach is keep the while loop and generate a list of 10,000 dataset that meet my criteria and then apply the function to the list. Here I use summary function as an example but my real function use both X_after and delta (ie. mle(X_after,delta)). Is this a better option if I have to use while loop?

Another concern I have is memory issue. How can I avoid using up memory while doing such large simulation?

mu=1 ; sigma=3 ; n=10 ; p=0.10
dset <- function (mu,sigma, n, p) {              
   Mean <- array()
   Median <- array()
   Pct_cens_array <- array()
   count = 0
   while(count < 5) { 

     lod <- quantile(rlnorm(100000, log(mu), log(sigma)), p = p)
     X_before <- rlnorm(n, log(mu), log(sigma))
     X_after <-  ifelse(X_before <= lod, lod,  X_before)
     delta <- ifelse(X_before <= lod, 1,  0) 
     pct_cens <- sum(delta)/length(delta)
     # print(pct_cens)
     if (pct_cens == 0 | pct_cens == 1 ) next
     else {
        count <-  count +1
        if (pct_cens > 0 & pct_cens < 1) {
             sumStats <- summary(X_after)
             Median[count] <- sumStats[3]
             Mean [count]<- sumStats[4]
             Pct_cens_array [count] <- pct_cens 
             print(list(pct_cens=pct_cens,X_after=X_after, delta=delta, Median=Median,Mean=Mean,Pct_cens_array=Pct_cens_array))
          }
       }
    }

          return(data.frame(Pct_cens_array=Pct_cens_array, Mean=Mean, Median=Median)) 
 }

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-01T23:04:50+00:00

I’ve made a few little tweaks to your code without changing the whole style of it. It would be good to heed Yoong Kim’s advice and try to break up the code into smaller pieces, to make it more readable and maintainable.

Your function now gets two “n” arguments, for how many samples you have in each row, and how many iterations (columns) you want.
You were growing the arrays Median and Mean in the loop, which requires a lot of messing about reallocating memory and copying things, which slows everything down. I’ve predefined X_after and moved the mean and median calculations after the loop to avoid this. (As a bonus, mean and median only get called once instead of n_iteration times.)
The calls to ifelse weren’t really needed.
It is a little quicker to call rlnorm once, generating enough values for x and the lod, than to call it twice.

Here’s the updated function.

dset2 <- function (mu, sigma, n_samples, n_iterations, p) {    
  X_after <- matrix(NA_real_, nrow = n_iterations, ncol = n_samples)
  pct_cens <- numeric(n_iterations)
  count <- 1
  while(count <= n_iterations) {     
    random_values <- rlnorm(2L * n_samples, log(mu), log(sigma))
    lod <- quantile(random_values[1:n_samples], p = p)
    X_before <- random_values[(n_samples + 1L):(2L * n_samples)]
    X_after[count, ] <- pmax(X_before, lod)
    delta <- X_before <= lod
    pct_cens[count] <- mean(delta)
    if (pct_cens > 0 && pct_cens < 1 ) count <- count + 1
  }

  Median <- apply(X_after, 1, median)
  Mean <- rowMeans(X_after)
  data.frame(Pct_cens_array=pct_cens, Mean=Mean, Median=Median) 
}

Compare timings with, for example,

mu=1
sigma=3
n_samples=10L
n_iterations = 1000L
p=0.10
system.time(dset(mu,sigma, n_samples, n_iterations, p))
system.time(dset2(mu,sigma, n_samples, n_iterations, p))

On my machine, there is a factor of 3 speedup.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I need to generate simulated data where the percent censored cannot be 0 or

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply