I have the following regex expression to match html links:
<a\s*href=['|'](http:\/\/(.*?)\S['|']>
it kind of works. Except not really. Because it grabs everything after the < a href… and just keeps going. I want to exclude the quote characters from that last \S match. Is there any way of doing that?
EDIT: This would make it grab only up to the quotes instead of everything after the < a href btw
I don’t think your regex is doing what you want.
This captures anything non-greedily from http:// up to the first non-space character before a quote, single quote, or pipe. For that matter, I’m not sure how it parses, as it doesn’t seem to have enough close parens.
If you are trying to capture the href, you might try something like this:
This uses the .*? (non-greedy match anything) to allow for other attributes (target, title, etc.). It matches an href that begins and ends with either a single or double quote (it does not distinguish, and allows the href to open with one and close with the other).