I have isolated a problem we are running into down to a simple test:
Try to run a straight up JDBC insert or update on a longtext column type with the parameter value
new String(new char[]{0xDBFF, 0xDC00});
An exception occurs stating:
“Incorrect string value: ‘\xF4\x8F\xB0\x80’ for column”
It appears that these two characters when paired together, form a valid Chinese symbol (individually they are meaningless)
How can I deal with these messed up characters? They form a valid symbol and Character.isDefined returns true for both characters. Stripping out specifically those character codes from all strings seems like it would be begging for more problems with different combinations of Chinese characters.
Encoded with
UFT-8this character will result in 4 bytes:MySQL 5.0/5.1 does not support 4byte UTF8-characters, this is a known limitation.
MySQL 5.5 does support 4byte UTF8-characters.
See 9.1.10. Unicode Support