Is there a quick way to split a large data.frame by keywords
so for example if I have the data set below is there a quick way to split the data frame at each occurrence of the source:restaurant line? Another take on the question would be is there a quick way of creating factors for the dataframe based upon a list of cut offs (in this case c(3,7,10)) that would then give me e.g. factors=c(A,A,A,B,B,B,B,C,C,C) that I could use in a split(mylist,factors) formula? Thanks
mylist=structure(list(V1 = structure(c(5L, 3L, 7L, 8L, 6L, 4L, 7L, 2L,
1L, 7L), .Label = c("cider", "claret", "custard", "krispies",
"rhubarb", "shreddies", "source:restaurant", "weetabix"), class = "factor"),
V2 = c(1L, 5L, NA, 9L, 13L, 17L, NA, 21L, 25L, NA), V3 = c(2L,
6L, NA, 10L, 14L, 18L, NA, 22L, 26L, NA), V4 = c(3L, 7L,
NA, 11L, 15L, 19L, NA, 23L, 27L, NA), V5 = c(4L, 8L, NA,
12L, 16L, 20L, NA, 24L, 28L, NA)), .Names = c("V1", "V2",
"V3", "V4", "V5"), class = "data.frame", row.names = c(NA, -10L
))
A very clunky possible solution below but I’m hoping for something a bit more elegant..
temp=NULL
a=which(mylist[,1] == 'source:restaurant')
for(i in seq_along(a)){temp=c(temp,rep(letters[i],(a[i]-length(temp))))}
temp=as.factor(temp)
split(mylist,temp)
The factor:
the split:
UPDATE: you probably have the
restaurant:soureat the end of each group that it marks, to account for this you can use:would be better.