I want to extract few pieces of data from a string using a single regular expression. I made a pattern which includes these pieces as subexpressions in parentheses. In perl-like environment, I would simply pass these subexpressions to variables by code like myvar1=$1; myvar2=$2;, etc. – but how to do this in R?
For the moment, the only way I found to access these occurrences is through regexec. It is not very convenient because regexec does not support perl syntax and for other reasons. This is what I have to do now:
getoccurence <- function(text,rex,n) { # rex is the result of regexec function
occstart <- rex[[1]][n+1]
occstop <- occstart+attr(rex[[1]],'match.length')[n+1]-1
occtext <- substr(text,occstart[i],occstop)
return(occtext)
}
mytext <- "junk text, 12.3456, -01.234, valuable text before comma, all the rest"
mypattern <- "([0-9]+\\.[0-9]+), (-?[0-9]+\\.[0-9]+), (.*),"
rez <- regexec(mypattern, mytext)
var1 <- getoccurence(mytext, rez, 1)
var2 <- getoccurence(mytext, rez, 2)
var3 <- getoccurence(mytext, rez, 3)
Obviously, it is quite clumsy solution, there should be something much better. I would appreciate any advices.
Have you had a look at
regmatches?