I have a very large data set, and I have already split it into 50 pieces
So basically the file looks like:
file1
file2
file3
.
.
.
file50 (data frames)
file_total <- c(file1,...,file50)
I know this will combine it into a list, but I can’t use rbind since the whole all data is huge and the plyr library just takes forever to run
And in each of the files, I have to split them based on 1 factor, name it “id”, then be able to write each of the id subsets into a .csv file
so far, my codes are:
d_split <- split(file1, file1[1])
library(plry)
id <- unlist(lapply(d_split,"[",1,1)) # this returns the unique id
for (j in seq_along(id))
{
write.csv(d_split[[j]], file=paste(id[j], "csv", sep="."))
}
this works!!
but It doesn’t work when I try to put it into a another for loop:
for (i in file_total)
{
d_split <- split(i, i[1])
id <- unlist(lapply(d_split,"[",1,1))
for (j in seq_along(id))
{
write.csv(d_split[[j]], file=paste(id[j], "csv", sep="."))
}
}
It returns to the following error messages:
Error in FUN(X[[1L]], ...) : incorrect number of dimensions
I meant I could done it manually by copy and pasting 50 files into the code, but was just wondering if anyone could fix my code, so that one click will get it solved.
The problem occurs based on how you combine the data. Instead of combining them with
c, make them into a list:At this point, doing
i in file_totalwill iterate as you want it to.As an explanation: using
cwith data frames (as I’m assumingfile1andfile2are) will actually turn them into a list of vectors rather than a list of data frames. For instance:Thus, iterating over them will actually iterate over the individual columns as vectors. However, using
listto combine them will let you iterate over the data frames themselves: