I have a function in R that chokes if I apply it to a dataset with more than 1000 rows. Therefore, I want to split my dataset into a list of n chunks, each of not more than 1000 rows.
Here’s the function I’m currently using to do the chunking:
chunkData <- function(Data,chunkSize){
Chunks <- floor(0:(nrow(Data)-1)/(chunkSize))
lapply(unique(Chunks),function(x) Data[Chunks==x,])
}
chunkData(iris,100)
I would like to make this function more efficient, so that it runs faster on large datasets.
You can do this easily using
splitfrombaseR. For example,split(iris, 1:3), will split theirisdataset into a list of three data frames by row. You can modify the arguments to specify a chunk size.Since the output is still a list of data frames, you can easily use
lapplyon the output to process the data, and combine them as required.Since speed is the primary issue for using this approach, I would recommend that you take a look at the
data.tablepackage, which works great with large data sets. If you specify more information on what you are trying to achieve in your function, people at SO might be able to help.