I parsing mp3 tags.
String artist – I do not know what was on the encoding
Ïåñíÿ ïðî íàäåæäó – example string in russian "Песня про надежду"
I use http://code.google.com/p/juniversalchardet/
code:
String GetEncoding(String text) throws IOException {
byte[] buf = new byte[4096];
InputStream fis = new ByteArrayInputStream(text.getBytes());
UniversalDetector detector = new UniversalDetector(null);
int nread;
while ((nread = fis.read(buf)) > 0 && !detector.isDone()) {
detector.handleData(buf, 0, nread);
}
detector.dataEnd();
String encoding = detector.getDetectedCharset();
detector.reset();
return encoding;
}
And covert
new String(text.getBytes(encoding), "cp1251"); -but this not work.
if I use utf-16
new String(text.getBytes("UTF-16"), "cp1251") return “юя П е с н я п р о н а д е ж д у” space – not is char space
EDIT:
this first read bytes
byte[] abyFrameData = new byte[iTagSize];
oID3DIS.readFully(abyFrameData);
ByteArrayInputStream oFrameBAIS = new ByteArrayInputStream(abyFrameData);
String s = new String(abyFrameData, “????”);
Java strings are UTF-16. All other encodings can be represented using byte sequences. To decode character data, you must provide the encoding when you first create the string. If you have a corrupted string, it is already too late.
Assuming ID3, the specifications define the rules for encoding. For example, ID3v2.4.0 might restrict the encodings used via an extended header:
Encoding handling is defined further down the document:
Use transcoding classes like
InputStreamReaderor (more likely in this case) theString(byte[],Charset)constructor to decode the data. See also Java: a rough guide to character encoding.Parsing the string components of an ID3v2.4.0 data structure would something like this: