Say I have a string like this:
[1] "<u>Degradation:</u> AGL, PGM1, PGM2, PGM3, PYGL, PYGM.<br>\n"
I want to extract each of these gene IDs into a vector. I could probably use strsplit in this case, but I want to do this with regex as I will later have more complex cases. Say I want to extract all strings that contain ‘[A-Z0-9]{2,} (if it contains any combinations of at least two capital letters and numbers then I want it).
Thoughts?
The
stringrpackage makes this kind of thing pretty easy.