In my database I get the error
com.mysql.jdbc.MysqlDataTruncation: Data truncation: Data too long for column
I use Java and MySQL 5. As I know 4-byte Unicode is legal i Java, but illegal in MySQL 5, I think that it can cause my problem and I want to check type of my data, so here’s my question:
How can i check that my UTF-8 data is 3-byte or 4-byte Unicode?
UTF-8 encodes everything in the basic multilingual plane (i.e. U+0000 to U+FFFF inclusive) in 1-3 bytes. Therefore, you just need to check whether everything in your string is in the BMP.
In Java, that means checking whether any
char(which is a UTF-16 code unit) is a high or low surrogate character, as Java will use surrogate pairs to encode non-BMP characters: