Suppose I have a string like this:
> x <- c("16^TG40")
I am trying to get the result c(16 2 40) where the 2 is length(^TG)-1. I am able to find this pattern by, for example:
> gsub("(\\^[ACGT]+)", " \\1 ", x)
[1] "16 ^TG 40"
However, I am unable to replace this string with its length-1 directly. Is there a simpler way of replacing the matched pattern by the length?
After quite a bit of searching (here on SO and google searches), I ended up with stringr package, which I think is awesome. But still, it all boils down to finding the location of this pattern (using str_locate_all) and then replacing the subtring with whatever value one wants (using str_sub). I have over more than 100,000 strings and it is very time consuming (as the pattern may also occur multiple times in the string).
I’m running in parallel at the moment to compensate for the slowness, but I’d be glad to know if this is possible at all directly (or quickly).
Any ideas?
Here’s a base-R approach.
The syntax is far from intuitive but, by hewing closely to this template, you can perform all manner of manipulations and replacements of matched substrings. (See
?gregexprfor some more complicated examples.)