I’m trying to get the weather data from googles weather api and parse the document via JDOM.
This is the code I’m using:
SAXBuilder builder = new SAXBuilder();
Document doc;
URL url = new URL(GOOGLE_WEATHER_API);
doc = builder.build(url);
Element root = doc.getRootElement();
Element weather = root.getChild("weather");
List currentConditions = weather.getChildren("current_conditions");
...
Problem is that whenever the XML returned by Google contains an Umlaut (ü, ä, ö…), I get a JDOMParseException
org.jdom.input.JDOMParseException: Error on line 1 of document http://www.google.de/ig/api?weather=Heidelberg&hl=en:
Fatal Error: com.sap.engine.lib.xml.parser.ParserException:
Incorrect encoded sequence detected at character (hex) 0x72, (bin) 1110010.
Check whether the input parsed contains correctly encoded characters.
Encoding used is: ‘utf-8′(http://www.google.de/ig/api?weather=Heidelberg&hl=en, row:1, col:191):
Incorrect encoded sequence detected at character (hex) 0x72, (bin) 1110010.
Check whether the input parsed contains correctly encoded characters.
Encoding used is: ‘utf-8’ (http://www.google.de/ig/api?weather=Heidelberg&hl=en, row:1, col:191)
When I open the URL in a Browser an check the properties of the page the encoding is UTF-8. So I don’t know why it does not work.
Does anybody have an idea?
Best regards,
Paul
The xml result from that URL does not include any encoding in its xml header. Instead the encoding is specified on the Content-Type header of the http response (ISO-8859-1). Apparently, even though you are passing a URL to jdom, it is not handling this correctly (it is using UTF-8, which is the default for xml with no encoding). You need to either handle the http response yourself (reading the header and passing the correct encoding to jdom), or use a parser which can do that for you (although i don’t know of any standard xml parser which will).
If you used the standard xml APIs, you would do something like: