Background:
We’re building an application that allows our customers to supply data in a predefined (ie. we don’t control) XML format. The XSD is supplied to us by a Third Party, and we are expecting to receive an XML file that passes schema validation prior to us processing it.
The Problem:
The XSD that we are supplied with includes a default and target namespace, which means that if a customer supplies an XML file that doesn’t include the namespace, then the validation will pass. We obviously don’t want them to be supplying things that say they pass but shouldn’t, but the bigger concern is around the mass of additional checks that we will need to do on each element if I can’t find a solution to doing the XML validation.
The Questions:
Is it possible to force .NET to perform validation and ignore the namespace on the supplied XML and XSD. i.e. in some way “assume” that the namespace was attached.
- Is it possible to remove the namespaces in memory, easily, and reliably?
- What is the best practice in these situations?
Solutions that I have so far:
- Remove the namespace from the XSD everytime it’s updated (shouldn’t be very often.
This doesn’t get around the fact that if they supply a namespace it will be still pass validation. - Remove the namespace from the XSD, AND find a way to strip the namespace from the incoming XML everytime. This seems like a lot of code to perform something simple.
- Does some pre-qualification on the XML file before it validated to ensure that it has the correct namespace. Seems wrong to fail them due to an invalid namespace if the contents of the file are correct.
- Create a duplicate XSD that doesn’t have a namespace, however if they just supply the wrong namespace, or a different namespace, then it will still pass.
Example Xml:
<?xml version="1.0"?>
<xsd:schema version='3.09' elementFormDefault='qualified' attributeFormDefault='unqualified' id='blah' targetNamespace='urn:schemas-blah.com:blahExample' xmlns='urn:blah:blahExample' xmlns:xsd='http://www.w3.org/2001/XMLSchema'>
...
</xsd:schema>
with namespace that is different
<?xml version="1.0" encoding="UTF-8" ?>
<root xmlns="urn:myCompany.com:blahExample1" attr1="2001-03-03" attr2="google" >
...
</root>
without namespace at all.
<?xml version="1.0" encoding="UTF-8" ?>
<root attr1="2001-03-03" attr2="google" >
...
</root>
Trying to solve the same problem. I came up with what I think is a fairly clean solution. For clarity, I have ommited some validation on the input parameters.
First, the scenario: There is a webservice that recieves a file, that is supposed to be “well-formed” xml and valid against a XSD. Of course, we don’t trust the “well fomrmness” nor that it is valid against the XSD that “we know” is the correct.
The code for such webservice method is presented below, I think it’s self-explanatory.
The main point of interest is the order in wich the validations are happening, you don’t check for the namespace before loading, you check after, but cleanly.
I decided I could live with some exception handling, as it’s expected that most files will be “good” and because that’s the framework way of dealing (so I won’t fight it).
Now, the xsd would have somthing like:
And the “good” XML would be something like:
I tested, “bad format XML”, “invalid input according to XSD”, “incorrect namespace”.
references:
Read from memorystream
Trying avoid exception handling checking for wellformness
Validating against XSD, catch the errors
Interesting post about inline schema validation
Hi Martin,
the comment sction is too short for my answer, so I’ll give it here, it may or not be be a complete answer, let’s improve it together 🙂
I made the following tests:
The strategy followed (wich I prefer) was, if the document doesn’t comply, then don’t accept, but give some information on the reason (eg. “wrong namespace”).
This strategy seems contrary to what you previously said:
In this case, it seems you can just ignore the defined namespace in the XML. To do that you would skip the validation of correct namespace:
Other ideas…
In a parallel line of thought, to replace the supplied namespace by your own, maybe you could set
doc.DocumentElement.NamespaceURI = "mySpecialNamespace"thus replacing the namepsace of the root element.Reference:
add-multiple-namespaces-to-the-root-element