I have the following data frame in R:
c1 c2
1 10 a
2 20 a
3 30 b
4 40 b
I then split it as follows: z = lapply(split(test$c1, test$c2), function(x) {cut(x,2)}).
z is then:
$a
[1] (9.99,15] (15,20]
Levels: (9.99,15] (15,20]
$b
[1] (30,35] (35,40]
Levels: (30,35] (35,40]
I would like to then merge the factors back by unsplitting the list unsplit(z, test$c2). This generates a warning:
[1] (9.99,15] (15,20] <NA> <NA>
Levels: (9.99,15] (15,20]
Warning message:
In `[<-.factor`(`*tmp*`, i, value = 1:2) :
invalid factor level, NAs generated
I would like to take a union of all the factor levels and then unsplit so that this error does not happen:
z$a = factor(z$a, levels=c(levels(z$a), levels(z$b)))
unsplit(z, test$c2)
[1] (9.99,15] (15,20] (30,35] (35,40]
Levels: (9.99,15] (15,20] (30,35] (35,40]
In my real data frame I have a very big list so I need to iterate over all the list elements (not just two). What is the best way to do this?
If I understood your question properly, I think you are making this a bit more complicated than needed. Here’s one solution using
plyr. We will group by thec2variable:which returns:
and has a structure of: