Im trying to parse a response from google weather api, but i get this

Question

0

Asked: June 10, 20262026-06-10T01:30:31+00:00 2026-06-10T01:30:31+00:00

Im trying to parse a response from google weather api, but i get this

0

Im trying to parse a response from google weather api, but i get this not well-formed error, as far as i can tell the response is well formed.

Here’s the relevant code:

f = urllib.urlopen(WEATHERPATH + sys.argv[1])
parser = make_parser()
parser.setContentHandler(GoogleWeatherHandler())
parser.parse(f)

XML:

<?xml version="1.0"?>
<xml_api_reply version="1">
    <weather module_id="0" tab_id="0" mobile_row="0" mobile_zipped="1" row="0" section="0" >
        <forecast_information>
            <city data="Ciudad Ju�rez, Chihuahua"/><postal_code data="Juarez"/>
            <latitude_e6 data=""/>
            <longitude_e6 data=""/>
            <forecast_date data="2012-08-14"/>
            <current_date_time data="2012-08-15 02:51:00 +0000"/>
            <unit_system data="US"/></forecast_information>
            <current_conditions>
                <condition data="Cloudy"/>
                <temp_f data="91"/>
                <temp_c data="33"/>
                <humidity data="Humidity: 22%"/>
                <icon data="/ig/images/weather/cloudy.gif"/>
                <wind_condition data="Wind: SE at 6 mph"/>
            </current_conditions>
                        // similar markup
</weather>
</xml_api_reply>

and the error:

Traceback (most recent call last):
  File "weather.py", line 34, in <module>
    main()
  File "weather.py", line 30, in main
    parser.parse(f)
  File "c:\Python26\lib\xml\sax\expatreader.py", line 107, in parse
    xmlreader.IncrementalParser.parse(self, source)
  File "c:\Python26\lib\xml\sax\xmlreader.py", line 123, in parse
    self.feed(buffer)
  File "c:\Python26\lib\xml\sax\expatreader.py", line 211, in feed
    self._err_handler.fatalError(exc)
  File "c:\Python26\lib\xml\sax\handler.py", line 38, in fatalError
    raise exception
xml.sax._exceptions.SAXParseException: <unknown>:1:179: not well-formed (invalid
 token)

All imports are already in place, i trust the interpreter but i can’t find the erron on the xml, second: it would be healpful to know what <unknown>:1:179 means.

Thanks.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-10T01:30:33+00:00

Looks like the accented á character in Juárez is the problem. You haven’t told the parser what the encoding is, so it’s obviously defaulted one, probably UTF-8, in which that character value is invalid — i.e. it’s expecting the UTF-8 encoding and your actual encoding is probably ISO-8859-1.

Configure the parser to expect ISO-8859-1 and your problem should go away.

If you can modify the XML, change the header to

<?xml version="1.0" encoding="iso-8859-1" ?>

Unicode is the standard that defines the character sets and is an abstract assignment of a unique number to every possible character in all known languages.

UTF-8 is just one of several possible ways to encode those characters in 8-bit bytes. Since UTF-8 has to encode more than 256 characters, it uses 2-, 3- and 4-bytes sequences. To avoid ambiguity, those sequences must begin with characters that cannot otherwise be used, so a set of high-order bit patterns (and thus certain sets of byte values) is reserved to mark the beginning of these multi-byte sequences. The encoding used in ISO-8859-1 (a different way to encode characters) for á happens to conflict with the characters reserved in UTF-8 to mark multi-byte sequences.

Part of the confusion over these issues stems from the fact that, for character codes 0x20 thru 0x7f, all the different encoding methods are the same (a single byte) for backwards compatibility. When you venture into characters that are not part of standard ASCII, things diverge depending on the encoding.

To get more specific:

Unicode á   - 0x00E1 
ISO-8859-1  - 0xE1
UTF-8       - 0xc3 0xa1

What happened here is that historically (before Unicode) á was already assigned the value 0xE1 in various computer standards (Windows-1252 for example). When Unicode was devised, they kept this code, but when it came time to encode this value in UTF-8, the rules specify that this becomes a 2-byte sequence 0xc3 0xa1. The single character value 0xE1 is not permitted to occur by itself in UTF-8 (I believe it marks the start of a 4-byte sequence, but I could be mistaken).

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Im trying to parse a response from google weather api, but i get this

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply