I am trying to use colsplit to break up a vector in a dataframe. The fact that we have regular expression as an arg to colsplit makes me think it can be flexible, but I am having trouble (it might just be that I’m not understanding regex in R).
Here’s the problem:
let’s create a vector…
> library(reshape)
> my_var_1 <- factor(c("x00_aaa_123","x00_bbb_123","x00_ccc_123","x01_aaa_123","x01_bbb_123","x01_ccc_123","x02_aaa_123","x02_bbb_123","x02_ccc_123"))
I would like to split it into two columns upon the first underscore.
In other words, I want my end result to be this…
x whatever
1 x00 aaa_123
2 x00 bbb_123
3 x00 ccc_123
4 x01 aaa_123
5 x01 bbb_123
6 x01 ccc_123
7 x02 aaa_123
8 x02 bbb_123
9 x02 ccc_123
I am trying to find the right regex inside of colspan that will do it, but no luck. Here’s the closest I can get…
> colsplit(my_var_1, split="_", c("x","whatever"))
x whatever NA.
1 x00 aaa 123
2 x00 bbb 123
3 x00 ccc 123
4 x01 aaa 123
5 x01 bbb 123
6 x01 ccc 123
7 x02 aaa 123
8 x02 bbb 123
9 x02 ccc 123
That uses the split regex as a simple delimiter and it gives me three columns. I would like to not split the second underscore (to make it worse, in my real data I have an arbitrary number of underscores not just two).
Is there an expression I can use for “split” that will give what I want?
I had hoped that the regex in colsplit would allow me to match on groups and the group matches would be the content of splits but that does not appear to be the case.
* edit (thanks to @Joshuaulrich) colsplit works “as intended” when using the newer reshape2 !!!
Your code throws an error for me:
splitisn’t an argument tocolsplit. The argument you want ispattern, or you can just rely on positional matching: