I have a text file which contains data I need to preload into a SQLite database. I saved in in res/raw.
I read the whole file using readTxtFromRaw(), then I use the StringTokenizer class to process the file line by line.
However the String returned by readTxtFromRaw does not show foreign characters that are in the file. I need these as some of the text is Spanish or French. Am I missing something?
Code:
String fileCont = new String(readTxtFromRaw(R.raw.wordstext));
StringTokenizer myToken = new StringTokenizer(fileCont , "\t\n\r\f");
The readTxtFromRaw method is:
private String readTxtFromRaw(Integer rawResource) throws IOException
{
InputStream inputStream = mCtx.getResources().openRawResource(rawResource);
ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
int i = inputStream.read();
while (i != -1)
{
byteArrayOutputStream.write(i);
i = inputStream.read();
}
inputStream.close();
return byteArrayOutputStream.toString();
}
The file was created using Eclipse, and all characters appear fine in Eclipse.
Could this have something to do with Eclipse itself? I set a breakpoint and checked out myToken in the Watch window. I tried to manually replace the weird character for the correct one (for example í, or é), and it would not let me.
Have you checked the several encodings?
the
byteArrayOutputStream.toString()converts according to the platform’s default character encoding. So I guess it will strip the foreign characters or convert them in a way that they are not displayed in your output.Have you already tried to use
byteArrayOutputStream.toString(String enc)? Try “UTF-8” or “iso-8859-1” or “UTF-16” for the encoding.