I already searched a long time for a good solution, but I can’t find

Question

0

Asked: May 13, 20262026-05-13T10:34:55+00:00 2026-05-13T10:34:55+00:00

I already searched a long time for a good solution, but I can’t find

0

I already searched a long time for a good solution, but I can’t find anything that fits my needs…

I want to parse an HTML file and display its content in a table. Everything is almost like writing yet another RSS feed reader. Doing that by parsing valid XML files is simple and straight forward using NSXMLParser or TouchXML or libxml directly or some other XML parseres out there… But these frameworks either only work with XML and/or are not working with non-tidy HTML. The site consists of divs including links that include images or paragraphs including links and images etc. etc… just a normal website. Using libxml seems way too complicated in that case.

Does somebody have more experience with parsing dirty HTML pages? Which (free) library/framework did you use? I have the feeling that I just miss something obvious here. It can’t be that difficult to parse HTML files, or not?

I hope you can point me to the right direction!

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-13T10:34:55+00:00

Editorial Team

2026-05-13T10:34:55+00:00Added an answer on May 13, 2026 at 10:34 am

If you need to parse most of the page, trying to use libXML2 as per Anurag is a good idea.

If you just want small segments of data from the file, you are better off using RegEx expressions to read out data – there’s also a built-in regex library, which you can use the wrapper RegExKitLite to access.

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I already searched a long time for a good solution, but I can’t find

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply