I’m trying to decode a char · using charset GB2312 in java
this char contained in GB2312, the positional code is a1a4 check here
code:
public static void main(String[] _args) throws Exception {
String str="a1a4:· a5f6:ヶ a8c5:ㄅ";
ByteBuffer bf=readToByteBuffer(new ByteArrayInputStream(str.getBytes()));
System.out.println(Charset.forName("GB2312").decode(bf).toString());
}
private static final int bufferSize = 0x20000;
static ByteBuffer readToByteBuffer(InputStream inStream) throws IOException {
byte[] buffer = new byte[bufferSize];
ByteArrayOutputStream outStream = new ByteArrayOutputStream(bufferSize);
int read;
while (true) {
read = inStream.read(buffer);
if (read == -1)
break;
outStream.write(buffer, 0, read);
}
ByteBuffer byteData = ByteBuffer.wrap(outStream.toByteArray());
return byteData;
}
The code above output results for:
a1a4:? a5f6:ヶ a8c5:ㄅ
I don’t understand why can’t decode a1a4?
In my browser, your string
dhas its fifth character encoded as0xB7, which isMIDDLE DOT, notKATAKANA MIDDLE DOT. However, according to the same database you mentioned, that code point is not available in the GB2312 character set. Likewise, you can see that neitherMIDDLE DOTnor an encoding of0xB7are listed as being part of GB2312.I think the problem here is with the characters in your input string, not in the
CharsetDecoderprovided by your JRE.