I looked through some Java HtmlParser ( Jericho, HtmlCLeaner, … ) but I couldn’t find a feature that when retreiving a page would replace the html frame tag with the actual source code .
Does anyone know about any parser that does that ?
Answer:
like Phani indicated I need a Html Scraper (not parser , cleaner )
HtmlUnit seems to do the trick : http://htmlunit.sourceforge.net/frame-howto.html
From your use case, you need a scraper than a cleaner.
Cleaner – Usually dirty, ill-formed and unsuitable for further processing. For any serious consumption of such documents, it is necessary to first clean up the mess and bring the order to tags, attributes and ordinary text.
Scraper – Read the pages pro grammatically and edit the html pages.
http://sourceforge.net/projects/htmlscraper/