In PHP, I want to encode ampersands that have not already been encoded. I came up with this regex
/&(?=[^a])/
It seems to work good so far, but seeing as how I’m not much of a regex expert, I am asking if any potential pitfalls can be seen in this regex?
Essentially it needs to convert & to & but leave the & in & as is (so as not to get &)
Thanks
Update
Thanks for the answers. It seems I wasn’t thinking broadly enough to cover all bases. This seems like a common pitfall of regexs themselves (having to think of all possibilities which may make your regex get false positives). It sure does beat my original one str_replace(' & ', ' & ', $string); 🙂
Even better would be negative lookahead assertion to verify & isn’t followed by amp;
Though that will change any ampersands used for other entities. If you’re likely to have others, then how about something like
This will look for an ampersand, but asserting that it is NOT followed by an optional hash symbol (for numeric entities), a series of alphanumerics and a semicolon, which should cover named and numeric entities like
"e;orªTest code
Which will output
which is more easily read as ‘It’s 30 ° outside & very hot. T-shirt & shorts needed!’
Alternative for PHP 5.2.3+
As Ionut G. Stan points out below, from PHP 5.2.3 you can use htmlspecialchars with a fourth parameter of false to prevent double-encoding, e.g.