I have a multilingual java application which gets and stores data in MySql Database.
I have kept table collation as utf-8-general-ci
For JDBC connection i use useUnicode=true&characterEncoding=UTF-8 parameters.
The characters like ® get displayed properly but chinese chars are messed up.
Now,
On Adding JVM argument -Dfile.encoding=UTF8
Chinese Chars are displayed but chars like ® are not.
What should i do to display all Chars that are in input from different languages.
Edit:
Input data comes from UDP packets which is processed by get methods on ByteBuffer.
and a getString Method implemented like this.
public String getString() {
byte[] remainingBytes = new byte[this.byteBuffer.remaining()];
this.byteBuffer.slice().get(remainingBytes);
String dataString = new String(remainingBytes);
int stringEnd = dataString.indexOf(0);
if(stringEnd == -1) {
return null;
} else {
dataString = dataString.substring(0, stringEnd);
this.byteBuffer.position(this.byteBuffer.position() + dataString.getBytes().length + 1);
return dataString;
}
}
you state when you try the character directly within MYSQL it works, only when java puts it there that its incorrect.
Tried getting your code to look for these characters and dumping them to a text file or out to std for a short test to compare the text std output vs what got sent to db ?
also worth storing the db transactions to see what was sent:
as far as mysql config goes ensure you have the tables and mysql itself running in utf-8 mode:
Ensure above has been put into /etc/mysql/my.cnf
for each DB name you have run below to get it to dump out tables and add an alter line to each table to convert to utf8
Other things worth trying – specially if its to write in utf-8 on this server:
Linux system environment:
Unix Locale
locale
LANG=en_GB.UTF-8
LC_CTYPE=”en_GB.UTF-8″
LC_NUMERIC=”en_GB.UTF-8″
LC_TIME=”en_GB.UTF-8″
LC_COLLATE=”en_GB.UTF-8″
LC_MONETARY=”en_GB.UTF-8″
LC_MESSAGES=”en_GB.UTF-8″
LC_PAPER=”en_GB.UTF-8″
LC_NAME=”en_GB.UTF-8″
LC_ADDRESS=”en_GB.UTF-8″
LC_TELEPHONE=”en_GB.UTF-8″
LC_MEASUREMENT=”en_GB.UTF-8″
LC_IDENTIFICATION=”en_GB.UTF-8″
LC_ALL=
To fix this
Re start box for services to pick up utf-8 as a user you will need to
log out totally and back in and check locale before reboot to ensure
its working.
This will now mean you can input japanese on your local ssh (if putty
in the settings utf-8 needs to be selected)
add URIEncoding=”UTF-8″ to
I also added to
3.2
In the web.xml for local sites (within WEB-INF) web.xml (unsure if
this is essential)
then look for mapping and also add:
I have come across specific character corruption issues worth opening up saving and viewing udp string in a good utf-8 editor (notepad++ with options to enable utf-8) or kate or something on kde.
also test out the different utf-8 characters the ones that do work and ones that potentially don’t work via std out or file on
http://www.fileformat.info/info/unicode/char/search.htm
and ensure the characters are the same
http://www.fileformat.info/info/unicode/char/00ae/index.htm