I am currently trying to extract from an Xml file (that is accessible via this address: http://mobilite.euroairport.com/services/getDepartureAirports?language=French) a list of airports.
My problem is that the ‘Ü’ that should appear in “DÜSSELDORF” is impossible to read (even Ie or firefox directly).
I obtain something like this:
D□SSELDORF or D SSELDORF or D?SSELDORF
The following is the code that I used to try to find the encoding of this file (n is the string that contain “DÜSSELDORF”):
byte[] bytes = n.getBytes();
Log.w("tagtag", (char) bytes[0] + " "+(char) bytes[1]+" "+(char) bytes[2]);
bytes = n.getBytes("ASCII");
Log.w("tagtag", (char) bytes[0] + " "+(char) bytes[1]+" "+(char) bytes[2]);
bytes = n.getBytes("Cp1252");
Log.w("tagtag", (char) bytes[0] + " "+(char) bytes[1]+" "+(char) bytes[2]);
bytes = n.getBytes("UTF-8");
Log.w("tagtag", (char) bytes[0] + " "+(char) bytes[1]+" "+(char) bytes[2]);
bytes = n.getBytes("ISO8859_1");
Log.w("tagtag", (char) bytes[0] + " "+(char) bytes[1]+" "+(char) bytes[2]);
bytes = n.getBytes("ISO8859_2");
And this is the result (in Logcat for android) :
10-08 09:41:30.557: W/tagtag(1506): D □ ン
10-08 09:41:30.557: W/tagtag(1506): D ? S
10-08 09:41:30.567: W/tagtag(1506): D ン S
10-08 09:41:30.567: W/tagtag(1506): D □ ン
10-08 09:41:30.577: W/tagtag(1506): D ン S
10-08 09:41:30.637: W/tagtag(1506): D ン S
My question is: do I make a mistake while trying to read this string, or is it a problem due to the server?
Definitley server/ (data service) (misconfiguration issue) / bug.
Server returns this line in HTML/XML response:
I just inspected byte dump of the xml, this is how wireshark represents “DSSELDORF”:
in hex dump (see UTF-8 code table for hex value c2 9d):
which would be:
and
C2 9D
gets interpreted as control character which is also known as non printable character – hence the “missing” U – which also explains your logcat output.