I am using this servlet to extract the HTML contents from another domain to

Question

0

Asked: June 1, 20262026-06-01T23:26:26+00:00 2026-06-01T23:26:26+00:00

I am using this servlet to extract the HTML contents from another domain to

0

I am using this servlet to extract the HTML contents from another domain to include in my own page with Ajax, it specifies the response as “UTF-8”:

public class ProxyServlet extends HttpServlet {
    public void doGet(HttpServletRequest request, HttpServletResponse response)
            throws ServletException  {
        String urlString = request.getParameter("url");
        try {
            URL url = new URL(urlString);
            url.openConnection();            
            BufferedReader reader = new BufferedReader(new InputStreamReader(url.openStream()));
            response.setContentType("text/html; charset=UTF-8");
            PrintWriter out = new PrintWriter(new OutputStreamWriter(response.getOutputStream(), "UTF8"), true);
            char[] buf = new char[4 * 1024];
            int len;
            while ((len = reader.read(buf, 0, buf.length)) != -1) {
              out.write(buf, 0, len);
            }
            out.flush();
        }
        catch (MalformedURLException e) {     
            throw new ServletException(e);
        }
        catch (IOException e) {     
            throw new ServletException(e);
        }
    }
}

The document I am extracting has a meta tag like this:

<meta http-equiv="Content-Type" content="text/html; charset=utf-8"></meta>

I copied and pasted it onto my own page so it matches exactly. According to the browser page info it is definitely using “UTF-8” encoding. Yet I am still getting “Â” instead of “&nbsp” in the extracted html contents.

They are actually contained in the responseText from this ProxyServlet. I thought explicitly defining the response content type and output stream charset would handle this but I must be missing something? Has anyone resolved this before.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-01T23:26:27+00:00

Instead of converting a byte stream to chars and vice versa you could just copy from ony bytes stream to another via a byte[] buffer. Or use a Spring util:

FileCopyUtils.copy(uri.getInputStream(), response.getOutputStream());

or explicitly:

byte[] buffer = new byte[BUFFER_SIZE];
int bytesRead = -1;
while ((bytesRead = in.read(buffer)) != -1) {
    out.write(buffer, 0, bytesRead);
}
out.flush();

It would guarantee that data is copied as is (without possible screwing things up via wrong chars)

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am using this servlet to extract the HTML contents from another domain to

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply