I wish to remove consecutive links on a webpage
Here is a sample
<div style="font-family: Arial;">
<br>
<a href="http://google.com">AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA</a>
<a href="http://google.com">BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB</a>
Google is a search
<a href="http://www.google.com">engine</a>
In the above html I want to remove the first 2 A tags and not the third one (My script should only remove consecutive tags)
Don’t use a regex for this. They are extremely powerful but not for finding this kind of “consecutive” tags.
I suggest you use DOM. Then you can browse the HTML as a tree.
Here is an example (not tested):