I have HTML in a file that I want to remove. Here are the examples:
<a name="0.3__Toc308117073"></a>
<h1><a name="0.3__Toc308117071"></a><font color="#3B608D" size="4" face="Cambria"><b>Gains on Sales of Qualified Small Business Stock</b></font></h1>
I want to remove the anchor tags and I want to remove the h1 tags and everything in between. What would be the right syntax for a preg_replace or something similar?
You should specify which parts are fixed, and which might differ from case to case. I’m especially interrested in the anchor name. Will “0.3_Toc” be the only fixedpart, or is part of the number also fixed? What about 0.2_Toc?
If it’s ok for you to use two regexes, then use something like these patterns in this order:
If you absolutely have to do it in one regex you’ll have to advance that up with some lookarounds to catch both cases. And that’s painfull (but fun, I guess). 🙂
Edit: Ok. I assumed you wanted only h1-tags with that sort of anchors as well as any loose anchors of that type. If the objective is to remove all h1-tags with content, and all anchor tags, you can use this instead:
So that would be a call to