I looked through some Java HtmlParser ( Jericho, HtmlCLeaner, … ) but I couldn’t

Question

0

Asked: June 2, 20262026-06-02T00:19:53+00:00 2026-06-02T00:19:53+00:00

I looked through some Java HtmlParser ( Jericho, HtmlCLeaner, … ) but I couldn’t

0

I looked through some Java HtmlParser ( Jericho, HtmlCLeaner, … ) but I couldn’t find a feature that when retreiving a page would replace the html frame tag with the actual source code .

Does anyone know about any parser that does that ?

Answer:

like Phani indicated I need a Html Scraper (not parser , cleaner )

HtmlUnit seems to do the trick : http://htmlunit.sourceforge.net/frame-howto.html

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-02T00:19:54+00:00

Editorial Team

2026-06-02T00:19:54+00:00Added an answer on June 2, 2026 at 12:19 am

From your use case, you need a scraper than a cleaner.

Cleaner – Usually dirty, ill-formed and unsuitable for further processing. For any serious consumption of such documents, it is necessary to first clean up the mess and bring the order to tags, attributes and ordinary text.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I looked through some Java HtmlParser ( Jericho, HtmlCLeaner, … ) but I couldn’t

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply