I’m attempting to parse xml containing foreign letters (æøå specifically), however I’m having problems making parsing them successfully. I don’t get any errors, but the letters are parsed as this; Instead of æ im getting æ, instead of å im getting Ã¥ and instead of ø im getting ø
I also just noticed the char – isn’t displaying properly.
I realise I could do .replaceAll for the 3 letters, but I’m not sure if the problem here its down to me making a mistake somewhere or if its just not possible without going down the route of replaceAll.
The code:
private Document getDomElement(String xml) {
Document doc = null;
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
try {
DocumentBuilder db = dbf.newDocumentBuilder();
InputSource is = new InputSource(new ByteArrayInputStream(
xml.getBytes()));
// is.setCharacterStream(new StringReader(xml));
is.setEncoding("UTF-8");
Log.i(TAG, "Encoding: " + is.getEncoding());
doc = db.parse(is);
} catch (ParserConfigurationException e) {
Log.e("Error: ", e.getMessage());
return null;
} catch (SAXException e) {
Log.e("Error: ", e.getMessage());
return null;
} catch (IOException e) {
Log.e("Error: ", e.getMessage());
return null;
}
// return DOM
return doc;
}
private String getValue(Element item, String str) {
NodeList n = item.getElementsByTagName(str);
return this.getElementValue(n.item(0));
}
private final String getElementValue(Node elem) {
Node child;
if (elem != null) {
if (elem.hasChildNodes()) {
for (child = elem.getFirstChild(); child != null; child = child
.getNextSibling()) {
if (child.getNodeType() == Node.TEXT_NODE) {
return child.getNodeValue();
}
}
}
}
return "";
}
}
Let me know if you need to see more code than this.
Appreciate any suggestions – Thanks.
The problem is that you are converting the String argument to bytes using
getBytes(). You’d be better off not converting to bytes at all:I see that you have that commented out in the code. Is there any reason you don’t want to use it?
If you have to use a byte array, it’s best to do it like this:
On older versions of Android, the default charset depended on the locale.