I have HTML in a file that I want to remove. Here are the

Question

0

Asked: May 26, 20262026-05-26T15:39:56+00:00 2026-05-26T15:39:56+00:00

I have HTML in a file that I want to remove. Here are the

0

I have HTML in a file that I want to remove. Here are the examples:

<a name="0.3__Toc308117073"></a>

<h1><a name="0.3__Toc308117071"></a><font color="#3B608D" size="4" face="Cambria"><b>Gains on Sales of Qualified Small Business Stock</b></font></h1>

I want to remove the anchor tags and I want to remove the h1 tags and everything in between. What would be the right syntax for a preg_replace or something similar?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-26T15:39:56+00:00

You should specify which parts are fixed, and which might differ from case to case. I’m especially interrested in the anchor name. Will “0.3_Toc” be the only fixedpart, or is part of the number also fixed? What about 0.2_Toc?

If it’s ok for you to use two regexes, then use something like these patterns in this order:

<h1><a name="0.3__Toc\d*">.*</a>.*</h1>
<a name="0.3__Toc\d*">.*</a>

If you absolutely have to do it in one regex you’ll have to advance that up with some lookarounds to catch both cases. And that’s painfull (but fun, I guess). 🙂

Edit: Ok. I assumed you wanted only h1-tags with that sort of anchors as well as any loose anchors of that type. If the objective is to remove all h1-tags with content, and all anchor tags, you can use this instead:

(<h1>.*</h1>)|(<a name=".*">.*</a>)

So that would be a call to

preg_replace('/(<h1>.*<\/h1>)|(<a name=".*">.*<\/a>)/im', '', $htmlToStrip);

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have HTML in a file that I want to remove. Here are the

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply