I’m developing a webcrawler, but often after a short time executing (minutes), some threads stop to do their work. Running a debugger, I found that it stop in SocketRead0.
This occurs when the thread will download the content of a page with a HttpURLConnection.getInputStream().
I don’t know what causes this, but I think that is associated to the multithreading.
Someone knows how to solve or avoid this?
I’m not using a pool of HttpURLConnection yet beucase I don’t know how to do.
conn = (HttpURLConnection) new URL(url).openConnection();
conn.setInstanceFollowRedirects(true);
conn.connect();
CountingInputStream content;
try {
content = new CountingInputStream(conn.getInputStream());
//processing of content
content.close();
return true;
} catch (Exception e) {
return false;
}
You need to set a socket read timeout on the connection. This will cause it to throw an exception instead of hanging after the specified time period.
http://download.oracle.com/javase/1.5.0/docs/api/java/net/URLConnection.html#setReadTimeout(int)