Need to replace a domain name on all the links on the page that are not images or pdf files.
This would be a full html page received through a proxy service.
Example:
<a href="http://www.test.com/bla/bla">test</a><a href="/bla/bla"><img src="http://www.test.com" /><a href="http://www.test.com/test.pdf">pdf</a>
<a href="http://www.test.com/bla/bla/bla">test1</a>
Result:
<a href="http://www.newdomain.com/bla/bla">test</a><a href="/bla/bla"><img src="http://www.test.com" /><a href="http://www.test.com/test.pdf">pdf</a>
<a href="http://www.newdomain.com/bla/bla/bla">test1</a>
If the domain is http://www.example.com, the following should do the trick:
This uses a negative lookahead to ensure that the regex matches a string only if the string does not contain pdf,png,jpg or gif at the specified position.