i would like to read a website url and output the html code of the content to a string
.After that i would like to search for urls within the string and output them to a an other string.Anyway i would like to help me only with the output of the html code to a string.
Thank you in advance. i have the following code. Is it correct
URL url = new URL("http://www.example.com/");
URLConnection con = url.openConnection();
InputStream in = con.getInputStream();
String encoding = con.getContentEncoding();
encoding = encoding == null ? "UTF-8" : encoding;
String body = IOUtils.toString(in, encoding);
I have used the jericho parsing library which turned out to be very handy.
I allows you to browse the HTML tags of the document and access the tags attributes.
For example, to get all the links’ urls: (please check the exact syntax in documentation)