I have a dataframe running into about 500,000 rows. One of these columns contains positive integer values, say column A. let there be another column B
I now need to create a second dataframe with number of rows equal to sum(dataframe$A). this is done.
A question of performance arises when i need to fill this new data frame up with data. I am trying to create a column A2 for this second frame as follows:
A2<-vector()
for (i in 1:nrow(dataframe)){
A2<-c(A2,rep(dataframe$B[i],dataframe$A[i]))
}
The external loop is obviously very slow for the large number of rows being processed. Any suggestions on how to achieve this task with faster processing.
Thanks for responses
You simply do not need the loop at all.
repis already vectorized.Should work. As a reproducible example, here is your way using the built in
mtcarsdataset.and vectorized, it is:
which will be orders of magnitude faster than using a loop.