I’m creating an XML file based on someone else’s XSD specification, but I just can’t figure out why it doesn’t validate.
Here’s the rule:
<xs:simpleType name="NonEmptyStringType">
<xs:restriction base="xs:string">
<xs:minLength value="1" />
<xs:pattern value="[^\t\n\r]*[^\s][^\t\n\r]*" />
</xs:restriction>
</xs:simpleType>
in which I read the pattern as follows:
[^\t\n\r]*match anything that is not tabs, newlines orspacesreturns 0 or more times[^\s]match anything that is not a space[^\t\n\r]*match anything that is not tabs, newlines orspacesreturns 0 or more times
and following example of one of the many mismatching xml:
<Zipcode>3506 RT</Zipcode>
It’s not matching 3506 RT (or 3506RT for that matter, and many other things I would expect to match) according to xmllint, with the following error:
element Zipcode: Schemas validity error : Element '{http://www.reeleezee.nl/taxonomy/1.23}Zipcode': [facet 'pattern'] The value '3506 RT' is not accepted by the pattern '[^\t\n\r]*[^\s][^\t\n\r]*'.
Any hints on what I’m not interpreting right? (I don’t understand the strictness of their NonEmptyStringType btw, I would just use .+)
As requested, here’s the zipcode declaration:
<xs:element name="Zipcode" minOccurs="0" nillable="true" rse:CanIgnore="true">
<xs:annotation>
<xs:documentation>Postcode</xs:documentation>
</xs:annotation>
<xs:simpleType>
<xs:restriction base="NonEmptyStringType">
<xs:maxLength value="10" />
</xs:restriction>
</xs:simpleType>
</xs:element>
as you can see, this links back to the pattern in NonEmptyStringType (first rule posted above)
This regex looks fine to me. I think it’s a bug in your validation tool… they are often buggy in edge-cases.
OK, just checked: xerces accepts it; xmllint fails (I see you were using xmllint). I’ve found several times in the past that xerces is correct, and xmllint has problems in unusual cases. And this regex is unusual. (I have to say, I actually love xmllint, it’s really fast, but the xsd spec is huge, complex and confusing, and the xmllint folks haven’t nailed all the edge cases yet).
The two online validators I tried also accept it: http://www.utilities-online.info/xsdvalidation and http://www.freeformatter.com/xml-validator-xsd.html
BTW: for xerces, I downloaded their java version, and found their class
jaxp.SourceValidatorthe best tool for validating. But I believe it’s the same code already in java.EDIT I did some more tests in xerces, to ensure that the regex can fail (i.e. it is active). It fails if there is a
\nanywhere. (same for\t, though I didn’t test\r).Checking the spec,
\sis defined as[#x20\t\n\r](in this table). That makes it clear that the regex is saying you can’t have
\t,\nor\ranywhere. But you can have as many literal space characters (#x20) as you like, provided they aren’t all space characters (i.e. there is at least one non-space char, to match that[^\s]– btw could notate that as\S). Xerces confirms this: all spaces gives an error.Maybe they want to allow space literals (both padding and interspersing), provided there is some value in there (i.e. not all spaces).