I’m trying to parse some XML with EclipseLink MOXy, and it’s failing on the line with the xsi attribute. If I remove this, it parses fine. However, I’ve got 100GiB of XML to wade through and changing the source files is not an option.
It’s been suggested that if I can set XmlParser.setNamespaceAware(false) then it should work – but I’ve got no idea how to configure this, without breaking right into the guts of MOXy.
<record>
<header>
<!-- citation-id: 14404534; type: journal_article; -->
<identifier>info:doi/10.1007/s10973-004-0435-2</identifier>
<datestamp>2009-04-28</datestamp>
<setSpec>J</setSpec>
<setSpec>J:1007</setSpec>
<setSpec>J:1007:2777</setSpec>
</header>
<metadata>
<crossref xmlns="http://www.crossref.org/xschema/1.0"
xsi:schemaLocation="http://www.crossref.org/xschema/1.0 http://www.crossref.org/schema/unixref1.0.xsd">
<journal>
<journal_metadata language="en">
[...]
The exception I get when the xsi: prefix is present is:
org.springframework.oxm.UnmarshallingFailureException: JAXB unmarshalling exception; nested exception is javax.xml.bind.UnmarshalException
- with linked exception:
[Exception [EclipseLink-25004] (Eclipse Persistence Services - 2.4.0.v20120608-r11652): org.eclipse.persistence.exceptions.XMLMarshalException
Exception Description: An error occurred unmarshalling the document
Internal Exception: javax.xml.stream.XMLStreamException: ParseError at [row,col]:[13,107]
Message: http://www.w3.org/TR/1999/REC-xml-names-19990114#AttributePrefixUnbound?crossref&xsi:schemaLocation&xsi]
There currently isn’t an option in EclipseLink JAXB (MOXy) to tell it to ignore namespaces. But there is an approach you can use by leveraging a StAX parser.
Demo
You can create a StAX
XMLStreamReaderon the XML input that is not namespace aware and then have MOXy unmarshal from that.Java Model (Foo)
Input (input.xml)
Below is a simplified version of the XML from your question. Note that this XML is not properly namespace qualified since it is missing the namespace declaration for the xsi prefix.
Output
Below is the output from running the demo code.