I have query regarding specifying sub-parts of an element while defining a DTD for XML. I want to have an element titled “Description”, which may have any inter-leaved sequence of a BookRef and PCDATA. I’m using following statement in my XML DTD.
<!ELEMENT Description (#PCDATA|Courseref)* >
However, I want to enforce a more strict constraint than *. I want to use +, which should mandate the having of at least one PCDATA or Courseref. However, when I use + instead of *, I get a parse error using xmllint.
I’m new to DTD and I want to know, if it is illegal as per XML DTD Specs to specify a + operator.
Yes, the XML spec requires that content models of the form
list
#PCDATAfirst and use*not+(or anything else) as the occurrence indicator (http://www.w3.org/TR/xml/#NT-Mixed).A lot of design considerations played into this, most of them now of purely historical importance. One, however, may be worth noticing: if
+were allowed and you did writethe element declaration would define precisely the same set of valid element instances as the form using
*: the token#PCDATAmatches zero or more characters of parsed character data, so an element instance like<Description/>would be valid against either form of the element declaration (the zero-length string matches the content-model token#PCDATAonce, so the requirement that a+-marked choice be satisfied at least once would be met).You might convey your intent here by making Description contain
and saying in the documentation that empty
p(paragraph) elements are frowned upon. But DTDs do not provide a way of requiring that there be any minimum length content for a#PCDATAstring. That’s one reason some people prefer to use XSD, or Schematron, or Relax NG.