I am trying to scrape some content from an HTML page. I’m using libxml2

Question

0

Asked: May 16, 20262026-05-16T21:40:53+00:00 2026-05-16T21:40:53+00:00

I am trying to scrape some content from an HTML page. I’m using libxml2

0

I am trying to scrape some content from an HTML page. I’m using libxml2 and htmlReadMemory to get a xmlDocPtr. The HTML is simple, but it has a problem. It’s basically the following:

<tr><td><tr><td>Some content</td></tr></td></tr>

libxml doesn’t like the nested tr, tds. It keeps giving me the following error:

HTML parser error : Unexpected end tag : td
      </TD>
           ^
HTML parser error : Unexpected end tag : tr
    </TR>

I am using the following option: HTML_PARSE_RECOVER.

At this point nothing i do allows libxml to parse the HTML because of this. I can’t change the HTML because I have no access to it.

Anyone have any clues how I can get libxml to parse this sort of HTML?

Thanks

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-16T21:40:54+00:00

Editorial Team

2026-05-16T21:40:54+00:00Added an answer on May 16, 2026 at 9:40 pm

What’s the exact call you’re using to parse? I’d suggest combining these options if you don’t want any errors/warnings:

HTML_PARSE_RECOVER|HTML_PARSE_NOERROR|HTML_PARSE_NOWARNING

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am trying to scrape some content from an HTML page. I’m using libxml2

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply