I have a string where I have to replace some content:
"...content... <a href='document/link/B1'>foo</a> ...content... <a href='document/link/B2'>bar</a> ..."
I’m looking for a clean way to obtain something like this:
"...content... <a href='document/link/23'>foo</a> ...content... <a href='document/link/24'>bar</a> ..."
Where ’23’ and ’24’ in the links are results of some processing that I did. So first I should be able to select the links, get their url (more specific: I need the B1 and B2) and then I have to perform some actions with e.g. B1 which results in ’23’ which I then have to insert back again in the string.
Is there a nice way to achieve this?
In general, it’s bad idea to use regex to parse HTML/XML. But for some sporadical use (run just once) and if you are sure about the structure of your HTML and don’t require much robustness, something like this (based on this) could do the trick:
You can tweak the regex pattern to be slightly more general (more spaces, tabs, additional atttributes inside the A tag, case sensitivity, doble quotes), but when you start to pretend to be totally general, so that your code works with any well formed HTML, then you’re screwed: try instead with a XML/DOM parser.