In other R code, it is common to see data.frame declared before a loop is started.
Suppose I have data frame data1 with 2000 rows.
And in a loop, I am via web service looping over data1 to create a new data.frame data2. (Please don’t recommend not using a loop).
And in data2$result and data2$pubcount I need to store different values for each of the 2000 data1 items.
Do I HAVE to declare before the loop
data2=data.frame()
and do I have to tell R how many rows and what columns I will later use? I know that columns can be added without declaring. What about rows. Is there advantage in doing:
data2<-data.frame(id=data1$id)
I would like to do only what I absolutely HAVE to declare and do.
Why the empty declaration gives error once in the loop?
later edit: Speed and memory is not of issue. 10s vs. 30s makes no difference and I have a under 100MB data and big PC (8GB). Matrix is not an option since the data is numbers and text (mixed), so I have to use non-matrix.
Something like this:
You should avoid manipulation of data.frames in a loop, since subsetting of data.frames is a slow operation:
Of course, there are often better ways than a
forloop. But it is strongly recommended to avoid growing objects (and I wont teach you how to do that). Pre-allocate as shown here.Why should you pre-allocate? Because growing objects in a loop is sloooowwwww and that’s one of the main reasons why people think loops in R are slow.