Customarily 0 indicates false and all other values are true.…

Question

0

Asked: May 12, 20262026-05-12T06:44:17+00:00 2026-05-12T06:44:17+00:00

I have the following Java code to fetch the entire contents of an HTML

0

I have the following Java code to fetch the entire contents of an HTML page at a given URL. Can this be done in a more efficient way? Any improvements are welcome.

public static String getHTML(final String url) throws IOException {
    if (url == null || url.length() == 0) {
        throw new IllegalArgumentException("url cannot be null or empty");
    }

    final HttpURLConnection conn = (HttpURLConnection) new URL(url).openConnection();
    final BufferedReader buf = new BufferedReader(new InputStreamReader(conn.getInputStream()));
    final StringBuilder page = new StringBuilder();
    final String lineEnd = System.getProperty("line.separator");
    String line;
    try {
        while (true) {
            line = buf.readLine();
            if (line == null) {
                break;
            }
            page.append(line).append(lineEnd);
        }
    } finally {
        buf.close();
    }

    return page.toString();
}

I can’t help but feel that the line reading is less than optimal. I know that I’m possibly masking a MalformedURLException caused by the openConnection call, and I’m okay with that.

My function also has the side-effect of making the HTML String have the correct line terminators for the current system. This isn’t a requirement.

I realize that network IO will probably dwarf the time it takes to read in the HTML, but I’d still like to know this is optimal.

On a side note: It would be awesome if StringBuilder had a constructor for an open InputStream that would simply take all the contents of the InputStream and read it into the StringBuilder.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-12T06:44:18+00:00

As seen in the other answers, there are many different edge cases (HTTP peculiarities, encoding, chunking, etc) that should be accounted for in any robust solution. Therefore I propose that in anything other than a toy program you use the de facto Java standard HTTP library: Apache HTTP Components HTTP Client.

They provide many samples, "just" getting the response contents for a request looks like this:

HttpClient httpclient = new DefaultHttpClient();
HttpGet httpget = new HttpGet("http://www.google.com/"); 
ResponseHandler<String> responseHandler = new BasicResponseHandler();    
String responseBody = httpclient.execute(httpget, responseHandler);
// responseBody now contains the contents of the page
System.out.println(responseBody);
httpclient.getConnectionManager().shutdown();

How to approach applying for a job at a company ...

How to handle personal stress caused by utterly incompetent and ...

What is a programmer’s life like?

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions