This is more a puzzle question for my curiosity than anything else. I’m looking for a single regular expression substitution that will convert entity escaped ampersands to an unescaped ampersands only within href attributes in an html file. For example:
<a href="http://example.com/index.html?foo=bar&baz=qux&frotz=frobnitz">
Me, myself & I</a>
Would convert to:
<a href="http://example.com/index.html?foo=bar&baz=qux&frotz=frobnitz">
Me, myself & I</a>
Now, I can do this in several statements but I’m curious if any perl regex gurus can do it in one.
The closest I’ve come so far is the following regex that doesn’t work because lookbehinds can’t be of variable length. Of course, it might not work even if they were allowed, I’m not sure.
s/(?<=href=".*?)&(?=.*?")/&/g;
Thanks.
Adapting your close approximation:
This is a cheat; but it is a single regex. The key part is the non-greedy scan for characters that are not a closing double quote followed by the
&string. The other observation to make is that given the input:You will get out:
You have to decide whether that matters.
The difficulty with any non-iterative solution is that once you’ve read the ‘
href="‘ in the first match, you won’t be seeing it again for subsequent matches.