We’re parsing an XML document using JAXB and get this error:
[org.xml.sax.SAXParseException: Invalid byte 1 of 1-byte UTF-8 sequence.]
at javax.xml.bind.helpers.AbstractUnmarshallerImpl.createUnmarshalException(AbstractUnmarshallerImpl.java:315)
What exactly does this mean and how can we resolve this??
We are executing the code as:
jaxbContext = JAXBContext.newInstance(Results.class);
Unmarshaller unmarshaller = jaxbContext.createUnmarshaller();
unmarshaller.setSchema(getSchema());
results = (Results) unmarshaller.unmarshal(new FileInputStream(inputFile));
Update
Issue appears to be due to this “funny” character in the XML file: ¿
Why would this cause such a problem??
Update 2
There are two of those weird characters in the file. They are around the middle of the file. Note that the file is created based on data in a database and those weird characters somehow got into the database.
Update 3
Here is the full XML snippet:
<Description><![CDATA[Mt. Belvieu ¿ Texas]]></Description>
Update 4
Note that there is no <?xml ...?> header.
The HEX for the special character is BF
So, you problem is that JAXB treats XML files without
<?xml ...?>header as UTF-8, when your file uses some other encoding (probably ISO-8859-1 or Windows-1252, if0xBFcharacter actually intended to mean¿).If you can change the producer of the file, you may add
<?xml ...?>header with actual encoding specification, or just use UTF-8 to write a file.If you can’t change the producer, you have to use
InputStreamReaderwith explicit encoding specification, because (unfortunately) JAXB don’t allow to change its default encoding:However, this solution is fragile – it fails on input files with
<?xml ...?>header with different encoding specification.