I have the following code :
System.out.println(new String("–’".getBytes("ISO8859_15_FDIS")));
The two characters in the first String are:
- – (
96in hex) - ’ (
92in hex)
The output of the code is:
??
Which is to be expected as the code page for ISO8859_15_FDIS shows that the two characters above are not in the printable part of the table.
What I would like to get is an acceptable mapping:
-'
(ie 2D and 27 in ISO8859_15_FDIS)
Is there a way to perform this conversion within the standard Java API, or am I just going to have to store some kind of explicit mapping (Map<Character,Character>) between the actual value and the wanted value?
For a bit of context, we have a Sybase database that is using this character set, and when users paste those characters into text areas on the fronting web app, they end up as question marks in the database.
Code like this is never correct:
The above code is always a transcoding bug.
The correct way to transcode these code points from windows-1252 to ISO8859_15_FDIS is as follows:
Java chars are always implicitly UTF-16 and all other encodings should be represented using byte arrays.
However, ISO-8859-15 does not support the two code points (U+2013 and U+2019) so this will be a lossy process. The values you are expecting (U+002D and U+0027) have identical byte values in both encodings.
These are just completely different code points and you will have to maintain some form of normalization routine to map characters that have visually similar graphemes.