I have question about grouping data into specific categories.
Generally, if I have a factor variable, I would perform something like below to bucket/recode the data into a preferred pattern:
educ = NA
educ[educ2 %in% levels(educ2)[c(5,8)]] <- "HS or Some College"
educ[educ2 %in% levels(educ2)[2:3]] <- "College Degree"
educ[educ2 %in% levels(educ2)[c(4,6)]] <- "Advanced Degree"
educ[educ2 %in% levels(educ2)[c(1,7,9)]] <- NA
educ = factor(educ)
However, I’m struggling with trying to regroup a factor variable, TIME, which has 10,000 + levels. The data is structured as follows:
> levels(wj$time)
[1] "0:00:05" "0:00:07" "0:00:08" "0:00:10" "0:00:13" "0:00:15" "0:00:18" "0:00:23" "0:00:31" "0:00:34" "0:00:36"
[12] "0:00:39" "0:00:41" "0:00:47" "0:00:48" "0:00:54" "0:00:55" "0:00:56" "0:00:59" "0:01:01" "0:01:02" "0:01:03"
[23] "0:01:13" "0:01:17" "0:01:31" "0:01:33" "0:01:41" "0:01:44" "0:01:48" "0:01:50" "0:01:52" "0:01:53" "0:01:55"
[34] "0:02:08" "0:02:12" "0:02:13" "0:02:21" "0:02:26" "0:02:27" "0:02:30" "0:02:32" "0:02:33" "0:02:36" "0:02:37"
[45] "0:02:38" "0:02:43" "0:02:45" "0:02:53" "0:02:56" "0:03:07" "0:03:15" "0:03:19" "0:03:21" "0:03:22" "0:03:24"
[56] "0:03:30" "0:03:36" "0:03:39" "0:03:41" "0:03:49" "0:03:56" "0:03:59" "0:04:02" "0:04:04" "0:04:07" "0:04:10"
[67] "0:04:11" "0:04:12" "0:04:14" "0:04:16" "0:04:17" "0:04:19" "0:04:22" "0:04:27" "0:04:28" "0:04:30" "0:04:37"
[78] "0:04:39" "0:04:41" "0:04:49" "0:04:51" "0:04:52" "0:04:53" "0:04:54" "0:05:05" "0:05:06" "0:05:20" "0:05:22"
I’m just not sure how to quickly bucket the data into specific brackets when there are so many factor levels. I’d like to group them into perhaps 0:12:00 to 0:05:00 and 0:05:01 to 0:10:00 and so forth. With so many factor levels, I’m just a little lost on how to identify when to start and end bucketing. Can anyone provide any help? With 10,000 + buckets, this becomes an issue with how I would traditionally do things.
Thanks!
You can split the timestamp into its components: the buckets are then very easy to compute.