I just need a suggestion. I have a program that takes valid html, and

Question

0

Asked: June 10, 20262026-06-10T08:34:21+00:00 2026-06-10T08:34:21+00:00

I just need a suggestion. I have a program that takes valid html, and

0

I just need a suggestion. I have a program that takes valid html, and saves it to a file, I need a way to parse this html file to retrieve every image documented within that html file. (e.g. /foo/bar.jpg). Is there a html parsing library that I could use to achieve this?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-10T08:34:23+00:00

Half an answer: There’s a Java parser called Tagsoup which will “Just Keep On Truckin'”, parsing anything with angle brackets and always producing a valid set of events to the application.

I mention this because I know that the idea and, crucially, the name have been adopted by libraries which have the same intention, in other languages. I can’t find a C version right now, but you may have more luck if you try some inventive searches with that starting point (the point is that the application which sits atop the parser doesn’t have to care about the horrors in the original source, but can pretend that it was well-formed XML, and do XMLish things to/with it).

Edit: oooh, and … there we go Taggle (C++, but possibly close enough, and that posting suggests that porting it from Java wasn’t hard)

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I just need a suggestion. I have a program that takes valid html, and

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply