im rubbish with regex if someone could help id be very appreciative.
its going to be a bit of a tough one i imagine – so my hats off too anyone that can solve it!
so say we have file that contains 2 html tags in the following formats:
abc1234
<a href="http://google.com">Some Text</a> <P>
<a href="http://www.google.com" rel="nofollow">Some Text</a>
abc1234
im trying to remove everything in those tags except the url (and leaving other text) so the output of the regex in this document would be
abc1234
http://google.com <P>
http://www.google.com
abc1234
Can any guru figure this one out? Id prefer one regex expression to handle both cases but two seperate ones would be fine too.
Thanks in advance/
I’m a Rubyist, so my example is going to be in Ruby. I’d recommend using two regexes, just to keep things straight:
You’ll want to pull the URL with the first regex out and store it temporarily, then replace the entire contents of the tag (matched with the tag_reg) with the stored URL.
You might be able to combine it, but it doesn’t seem like a good idea. You’re fundamentally altering (by deleting) the original tag, and replacing it with something inside itself. Less chance of things going wrong if you separate those two steps as much as possible.
Example in Ruby
Even if you don’t use Ruby, I hope the example makes sense. I tested this on your given input file, and it produces the expected output.