See http://bugs.php.net/bug.php?id=33060 - this is what's causing your issue. You…

Question

0

Asked: May 13, 20262026-05-13T07:22:37+00:00 2026-05-13T07:22:37+00:00

For example if I have this html: <div>this is a test < text</div> the

0

For example if I have this html:

<div>this is a test < text</div>

the < after the test is an error and the right html should be

<div>this is a test &lt; text</div>

But I have a lot of html files that by error were not encoded and i need fix this error so i can parse them later. The original source of data is not available so the only option is to fix this html I have.

Well, the sames applies to the > character and to text that has both < and > characters Like “<2000> – <2004>”. I would like to hear ideas for algorithms or libraries that can help me. Thanks.

Note: the html sample above is a sample and the work should be done on big html files.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-13T07:22:37+00:00

Editorial Team

2026-05-13T07:22:37+00:00Added an answer on May 13, 2026 at 7:22 am

I’d suggest this:

identify and map locations of all known tags, like <div> and </a>.
Replace < and > everywhere outside the map you built in step 1.

0

Reply
Share
Share

- Report

How to approach applying for a job at a company ...

How to handle personal stress caused by utterly incompetent and ...

What is a programmer’s life like?

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions