My program is receiving an integer array from a browser application that’s interpreted as UTF-8 (example in code). I can echo my resulting string (“theString” shown in the code below) back to the browser and everything’s fine. But it’s not fine in the Java program. The input string is “Hällo”. But it prints out from the Java program as “Hõllo”.
import java.io.*;
import java.nio.charset.*;
public class TestCode {
public static void main (String[] args) throws IOException {
// H : 72
// ä : 195 164
// l : 108
// o : 111
// the following is the input sent from browser representing String = "Hällo"
int[] utf8Array = {72, 195, 164, 108, 108, 111};
String notYet = new String(utf8Array, 0, utf8Array.length);
String theString = new String(notYet.getBytes(), Charset.forName("UTF-8"));
System.out.println(theString);
}
}
This will do the trick:
The problem with passing
int[]directly is that theStringclass interprets everyintas a separate char, while after converting tobyte[]Stringtreats input as raw bytes and understands that195, 164is actually is a single character consisting of two bytes rather than two characters.UPDATE: Answering your comment, unfortunately, Java is that verbose. Compare it to Scala:
Once again the difference between
intandbyteis not just the compiler being picky, they really mean different things when it comes to UTF-8 encoding.