I´m trying to simulate death over 7 years with the cumulative probability as follows:
tab <- data.frame(id=1:1000,char=rnorm(1000,7,4))
cum.prob <- c(0.05,0.07,0.08,0.09,0.1,0.11,0.12)
How can I sample from tab$id without replacement in a vectorized fashion according to the cumulative probability in cum.prob ? The ids sampled from yr 1 can necessarily not be sampled again in yr 2. Hence the lapply(cum.prob,function(x) sample(tab$id,x*1000)) will not work. Is it possible to vectorize this?
//M
Here’s one way: First get the probability of a given individual’s dying in a given year as
probYrDeath, i.e.probYrDeath[i] = Prob( individual dies in year i ), wherei=1,2,...,7.Now generate a random sample of 1000 “Death Years”, with replacement, from the sequence 1:8, according to the probabilities in
probYrDeath, augmented by the probability of not dying by year 7:We interpret “‘DeathYr = 8′” as “not dying within 7 years”, and extract the subset of
tabwhereDeathYr != 8:You can verify that the cumulative proportions of deaths in each year approximate the values in
cum.prob: