My client uses InputStreamReader/BufferedReader to fetch text from the Internet. However when I save

Question

0

Asked: June 1, 20262026-06-01T19:14:52+00:00 2026-06-01T19:14:52+00:00

My client uses InputStreamReader/BufferedReader to fetch text from the Internet. However when I save

0

My client uses InputStreamReader/BufferedReader to fetch text from the Internet.
However when I save the Text to a *.txt the text shows extra weird special symbols like ‘Â’.

I’ve tried Convert the String to ASCII but that mess upp å,ä,ö,Ø which I use.
I’ve tried food = food.replace("Â", ""); and IndexOf();
But string won’t find it. But it’s there in HEX Editor.

So summary: When I use text.setText(Android), the output looks fine with NO weird symbols, but when I save the text to *.txt I get about 4 of ‘Â’. I do not want ASCII because I use other Non-ASCII character.
The ‘Â’ is displayed as a Whitespace on my Android and in notepad.

Thanks!

Have A great Weekend!

EDIT:
Solved it by removing all Non-breaking-spaces:

myString.replaceAll("\\u00a0"," ");

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-01T19:14:55+00:00

You say that you are fetching like this:

in = new BufferedReader(new InputStreamReader(url.openStream(),"UTF-8"));

There is a fair chance that the stuff you are fetching is not encoded in UTF-8.

You need to call getContentType() on the HttpURLConnection object, and if it is non-null, extract the encoding and use it when you create the InputStreamReader. Only assume “UTF-8” if the response doesn’t supply a content type with a valid encoding.

On reflection, while you SHOULD pay attention to the content type returned by the server, the real problem is either in the way that you are writing the *.txt file, or in the display tool that is showing strange characters.

It is not clear what encoding you are using to write the file. Perhaps you have chosen the wrong one.
It is possible that the display tool is assuming that the file has a different encoding. Maybe it detects that a file is UTF-8 or UTF-16 is there is a BOM.
It is possible that the display tool is plain broken, and doesn’t understand non-breaking spaces.

When you display files using a HEX editor, it is most likely using an 8-bit character set to render bytes, and that character set is most likely Latin-1. But apparently, the file is actually encoded differently.

Anyway, the approach of replacing non-breaking spaces is (IMO) a hack, and it won’t deal with other stuff that you might encounter in the future. So I recommend that you take the time to really understand the problem, and fix it properly.

Finally, I think I understand why you might be getting Â characters. A Unicode NON-BREAKING-SPACE character is u00a0. When you encode that as UTF-8, you get C2 A0. But C2 in Latin-1 is CAPITAL-A-CIRCUMFLEX, and A0 in Latin-1 is NON-BREAKING-SPACE. So the “confusion” is most likely that your program is writing the *.txt file in UTF-8 and the tool is reading it as Latin-1.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

My client uses InputStreamReader/BufferedReader to fetch text from the Internet. However when I save

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply