This is a bare minimal example of a larger more complex dataset, just trying to get my head around something.
> grep("X10\\.1+",c("X10.10","X10.11","X10.12"))
[1] 1 2 3
Now I would have expected only 2 to have been returned, since ‘+’ is supposed to be ‘1 or more of the preceding element’. I thought escaping the period (which I have to deal with so want to keep it in the example) could have been causing the issue.
> grep("X101+",c("X1010","X1011","X1012"))
[1] 1 2 3
So, my understanding of the functionality of ‘+’ is wrong?
CONCLUSION:
Thanks @James. So my understanding was the + was ‘ANOTHER 1 or more of the preceding element’ as opposed to what it actually means, which is ‘JUST 1 or more of the preceding element’.
11+ would have done what I was thinking (having an ADDITIONAL 1 or more 1’s after the first 1 etc). Cheers
You need to signify that after any number of 1s, you want to match the end of the string. You use
$to do this.Similarly,
^matches the start of the string if you want to restrict that the match startsX10., rather thanPX10.for instance which would be matched by the existing regex.