I’ve written a Java program which scrapes some content from a web page. It retrieves the content by calling the readWebPage method every couple of seconds. The problem I’m having is that only the first read actually works. After the first time I read the web page the InputStream always appears to be empty (in.ready() return false).
Also, conn.getContentLength() return the same value every time, even though the content on the page has changed. If I restart the program the new content is fetched properly.
What have I missed? Do I have to perform some sort of refresh on the conn object?
private String readWebpage(HttpURLConnection conn) throws IOException{
conn.connect();
InputStreamReader in = new InputStreamReader((InputStream) conn.getContent());
BufferedReader buffer = new BufferedReader(in);
StringBuilder b = new StringBuilder(conn.getContentLength()+5);
String line;
while ((line=buffer.readLine())!=null){
b.append(line);
}
in.close();
buffer.close();
return b.toString();
}
Are you passing in the same
HttpURLConnectionobject every time? If yes, then since theInputStreamis tied to the underlying HTTP connection, you’ll get the sameInputStreamevery time rather than a new stream to the URL in consideration. Open a new connection (URL#openConnection) before passing it to this method and you should be good to go.