I am not able to read a UTF-8 characters from the file as bytes.
the UTF-8 characters are displaying as questionmarak(?) while converting to character from the bytes.
Below code snippet shows file reading.
Please tell me how can we read UTF-8 chanracters from a file.
and plz tell me what is the problem with byte array reading process?
public static void getData {
FormFile file = actionForm.getFile("UTF-8");
byte[] mybt;
try
{
byte[] fileContents = file.getFileData();
StringBuffer sb = new StringBuffer();
for(int i=0;i<fileContents.length;i++){
sb.append((char)fileContents[i]);
}
System.out.println(sb.toString());
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
}
}
Output ::??Docum??ents (input file content is : "ÞDocumÿents" , it contains some spanish characters. )
This is the problem:
You’re converting each byte to a char just by casting it. That’s effectively using ISO-Latin-1.
To read text from an
InputStream, you adapt it viaInputStreamReader, specifying the character encoding.The simplest way of reading the whole of a file into a string would be to use Guava:
Or to convert a byte array: