Does anyone know if the standard Java library (any version) provides a means of calculating the length of the binary encoding of a string (specifically UTF-8 in this case) without actually generating the encoded output? In other words, I’m looking for an efficient equivalent of this:
"some really long string".getBytes("UTF-8").length
I need to calculate a length prefix for potentially long serialized messages.
Here’s an implementation based on the UTF-8 specification:
This implementation is not tolerant of malformed strings.
Here’s a JUnit 4 test for verification:
Please excuse the compact formatting.