I have a collection of documents that I’m attempting to parse. Like HTML, they

Question

0

Asked: May 27, 20262026-05-27T04:53:22+00:00 2026-05-27T04:53:22+00:00

I have a collection of documents that I’m attempting to parse. Like HTML, they

0

I have a collection of documents that I’m attempting to parse. Like HTML, they are fairly well structured and have a complex syntax/grammar. Also like HTML, many of the documents do not fully adhere to the desired syntax.

My question is, what general strategies do browsers and HTML/XML parsing libraries use when parsing documents that don’t strictly follow the right syntax? They seem to deal with misplaced or missing tags well. And I’m sure there are other situations, such as misspelled tags, incorrect attributes, etc. that must be dealt with and not simply ignored.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-27T04:53:23+00:00

Editorial Team

2026-05-27T04:53:23+00:00Added an answer on May 27, 2026 at 4:53 am

Malformed or bad HTML is referred to as “tag soup”. Browsers have to deal with this and do so in different ways based on the browser (IE, Firefox, Chrome, etc.), but here is a good article on tag soup and some general strategies:

http://en.wikipedia.org/wiki/Tag_soup

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a collection of documents that I’m attempting to parse. Like HTML, they

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply