I’m tryng to download a web page in java with the following: URL url

Question

0

Asked: May 16, 20262026-05-16T20:00:02+00:00 2026-05-16T20:00:02+00:00

I’m tryng to download a web page in java with the following: URL url

0

I’m tryng to download a web page in java with the following:

URL url = new URL("www.jksfljasdlfas.com");
FIle to = new File("/home/test/test.html");

Reader in = new InputStreamReader(url.openStream(), "UTF-8");
Writer out = new OutputStreamWriter(new FileOutputStream(to), "UTF-8");

int c;
while((c = in.read()) != -1){
    out.write(c);
}
in.close();
out.close();

I download the page and some character are replaced by entities:
this:
<a href="http://www.generation276.org/film/?m=200812&paged=2" >Pagina successiva »</a>
become this:
<a href="http://www.generation276.org/film/?m=200812&paged=2" >Pagina successiva »</a>
Downloading the same page with Chrome, the & remains &.
I’m new in Charset/encoding; can anybody understand the probem?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-16T20:00:03+00:00

The Java part is working perfectly fine.

Chrome is tricking you there. In FireFox, when I select View -> Page Source, I see this:

<a href="http://www.generation276.org/film/?m=200812&#038;paged=3" >
Pagina successiva &raquo;</a>

while with FireBug / Inspect Element I see this:

<a href="http://www.generation276.org/film/?m=200812&paged=3" style="">
Pagina successiva »</a>

and it copies to the clipboard as this:

<a href="http://www.generation276.org/film/?m=200812&amp;paged=3" style="">
Pagina successiva »</a>

Browsers don’t always show you what’s really there.

The second part of your question is identical to this previous Question:

Java: How to decode HTML character
entities in Java like
HttpUtility.HtmlDecode?

And hence the answer is also the same:

Use StringEscapeUtils.unescapeHTML(String) from the Apache Commons / Lang project.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m tryng to download a web page in java with the following: URL url

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply