I am wondering if there is a way to read the html output of a given webpage using Java?
I know in php you can do something like:
$handle = @fopen("'http://www.google.com", "r");
$source_code = fread($handle,9000);
I am looking for the Java equivalent.
Additionally, once I have the rendered html are there any Java utilities that would allow me to strip out a single div by its id?
Thanks for any help with this.
Use jsoup.
You have the choice between a tree model and a powerful query syntax similar to CSS or jQuery selectors, plus utility methods to quickly get the source of a webpage.
To quote from their website:
Once you found the
Elementrepresenting thedivyou want to remove, just callremove()on it.