How to convert an invalid XML to Valid XML according given xsd-schema?
For example, I have next xsd schema:
<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="note">
<xs:complexType>
<xs:sequence>
<xs:element name="to" type="xs:string"/>
<xs:element name="from" type="xs:string"/>
<xs:element name="heading" type="xs:string"/>
<xs:element name="body" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
and next invalid XML:
<?xml version="1.0" encoding="UTF-8"?>
<note xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="../my_xsd.xsd">
<to>reviver@mail.com</to>
<from>sender@mail.com</from>
<body>blablabla</body> <!-- IVALID LINE, IT IS NOT IN RIGHT PALCE -->
<heading>head</heading>
</note>
My question is: Do JAXB, XSTREAM, or other XML parsers have solution to convert my invalid XML according given schema to valid XML:
<?xml version="1.0" encoding="UTF-8"?>
<note xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="../my_xsd.xsd">
<to>reviver@mail.com</to>
<from>sender@mail.com</from>
<heading>head</heading>
<body>blablabla</body>
</note>
Assumption: I’ll assume the input is well-formed XML.
In general, the answer is no… no algorithm would be able to convert an arbitrary XML input document to a valid and semantically correct instance of a given schema.
However, if the ways in which the input can be invalid are constrained to just a small set of problems, such as child elements of
<note>being out of order, then yes, just about any XML parsing and serialization library could help you fix the problem. As @KevinDTimm alluded to, you’ll want to turn off schema validation in these tools so that they don’t reject the input before fixing it.Personally I would use XSLT since that’s what I’m used to. You could have it read the child elements in whatever order they occur, and output them as XML in the correct order:
But the example tools you list – JAXB and XSTREAM – are not merely XML parsers, but XML object parsers/serializers. If you need to correct validation errors while building an object, that would complicate things. A separate process of correcting, and then deserializing, would be simpler.