I have a very large data set, and I have already split it into

Question

0

Asked: June 10, 20262026-06-10T05:28:55+00:00 2026-06-10T05:28:55+00:00

I have a very large data set, and I have already split it into

0

I have a very large data set, and I have already split it into 50 pieces
So basically the file looks like:
file1
file2
file3
.
.
.
file50 (data frames)

file_total <- c(file1,...,file50)

I know this will combine it into a list, but I can’t use rbind since the whole all data is huge and the plyr library just takes forever to run

And in each of the files, I have to split them based on 1 factor, name it “id”, then be able to write each of the id subsets into a .csv file

so far, my codes are:

d_split <- split(file1, file1[1])

library(plry)
id <- unlist(lapply(d_split,"[",1,1)) # this returns the unique id

for (j in seq_along(id))
{ 
    write.csv(d_split[[j]], file=paste(id[j], "csv", sep="."))
}

this works!!

but It doesn’t work when I try to put it into a another for loop:

for (i in file_total)
{
    d_split <- split(i, i[1])
    id <- unlist(lapply(d_split,"[",1,1)) 
    for (j in seq_along(id))
    {
        write.csv(d_split[[j]], file=paste(id[j], "csv", sep="."))
    }
}

It returns to the following error messages:

Error in FUN(X[[1L]], ...) : incorrect number of dimensions

I meant I could done it manually by copy and pasting 50 files into the code, but was just wondering if anyone could fix my code, so that one click will get it solved.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-10T05:28:56+00:00

The problem occurs based on how you combine the data. Instead of combining them with c, make them into a list:

file_total <- list(file1,...,file50)

At this point, doing i in file_total will iterate as you want it to.

As an explanation: using c with data frames (as I’m assuming file1 and file2 are) will actually turn them into a list of vectors rather than a list of data frames. For instance:

file1 = data.frame(x=1:20)
file2 = data.frame(y=20:40)
file_total = c(file1, file2)
# file_total will be:
# $x
#  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20
#
# $y
#  [1] 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

Thus, iterating over them will actually iterate over the individual columns as vectors. However, using list to combine them will let you iterate over the data frames themselves:

> list(file1, file2)
[[1]]
    x
1   1
2   2
3   3
4   4
5   5
6   6
7   7
8   8
9   9
10 10
11 11
12 12
13 13
14 14
15 15
16 16
17 17
18 18
19 19
20 20

[[2]]
    y
1  20
2  21
3  22
4  23
5  24
6  25
7  26
8  27
9  28
10 29
11 30
12 31
13 32
14 33
15 34
16 35
17 36
18 37
19 38
20 39
21 40

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a very large data set, and I have already split it into

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply