I’m attempting to use JAXB with data that does not technically fit the XML standard; in particular, the names of the elements are technically invalid as they begin with numeric characters. Here’s an overview of what the schema looks like.
<xs:element name = "ITEM">
<xs:complexType>
<xs:sequence>
<xs:element name="01" />
<xs:element name="08" />
<xs:element name="10">
<xs:complexType>
<xs:sequence>
<xs:element name="10_A" />
<xs:element name="10_B" />
</xs:sequence>
</xs:complexType>
</xs:element>
...
...Many more elements...
...
</xs:sequence>
</xs:complexType>
</xs:element>
Unfortunately, I don’t have the ability to modify this. Since the full ITEM is huge and has many levels of depth, using an automated tool like JAXB to create classes is a must. To do so, I prefixed the names of the elements with a character (in this case, ‘m’) so that XJC would accept it. I was hoping that at runtime, I could map the XML tags to my Java class in order to unmarshal the input into a Java object. In particular, something like this:
@XmlAccessorType(XmlAccessType.FIELD)
@XmlType(name = "", propOrder = {
"m01",
"m08",
"m10",
...
})
@XmlRootElement(name = "ITEM")
public class ITEM {
@XmlElement(name = "01")
protected String m01;
@XmlElement(name = "08")
protected String m08;
@XmlElement(name = "10")
protected M10 m10;
...
}
M10 would look like:
@XmlAccessorType(XmlAccessType.FIELD)
@XmlType(name = "", propOrder = {
"m10a",
"m10b",
...
})
public static class M10 {
@XmlElement(name = "10_A")
protected String m10a;
@XmlElement(name = "10_B")
protected String m10b;
...
}
I was hoping that JAXB would be able to match the @XmlElement tag to the tag in the input, but unfortunately this didn’t work out for me because JAXB won’t have any of this business with improper tags. If anybody is interested, the particular exception is:
org.xml.sax.SAXParseException: The content of elements must consist of well-formed character data or markup
Anyone have any advice on how to get around this problem? I feel like I could potentially run a regex swap on the input XML before JAXB parses it (and thus bypassing this issue completely), but modifying the input in such a way is rather undesirable.
It is not the JAXB (JSR-222) implementation complaining, but the underlying parser being used. The trick will be to find a tolerant XML parser.
StAX
If you can find a StAX (JSR-173) parser capable of handling this content then you could do the following:
SAX
Or if you find a SAX parser then you can do the following: