I would like to create a XSLT that can transform a XML so that all of the elements and attributes that is not defined in the XSD is excluded in the output XML (from the XSLT).
Lets say you have this XSD.
<xs:element name="parent">
<xs:complexType>
<xs:sequence>
<xs:element name="keptElement1" />
<xs:element name="keptElement2" />
</xs:sequence>
<xs:attribute name="keptAttribute1" />
<xs:attribute name="keptAttribute2" />
</complexType>
</xsd:element>
And you have this input XML
<parent keptAttribute1="kept"
keptAttribute2="kept"
notKeptAttribute3="not kept"
notKeptAttribute4="not kept">
<notKeptElement0>not kept</notKeptElement0>
<keptElement1>kept</keptElement1>
<keptElement2>kept</keptElement2>
<notKeptElement3>not kept</notKeptElement3>
</parent>
Then i would like to have the output Xml looking like this.
<parent keptAttribute1="kept"
keptAttribute2="kept">
<keptElement1>kept</keptElement1>
<keptElement2>kept</keptElement2>
</parent>
I am able to do this by specifying the elements, but this is about as far as my xslt skills reach. I have problem doing this generally for all elements and all attributes.
You have two challenges here: (1) identifying the set of element names and attributes declared in the schema, with appropriate context information for local declarations, and (2) writing XSLT to retain elements and attributes which match those names or names-and-contexts.
There is also a third issue, namely specifying clearly what you mean by “elements and attributes that are (or are not) defined in the XSD schema”. For purposes of discussion I’ll assume you mean elements and attributes which could be bound to element or attribute declarations in the schema, in a validation episode (a) rooted at an arbitrary point in the input document tree and (b) starting with a top-level element declaration or attribute declaration. This assumption means several things. (a) Local element declarations will only match things in context — in your example,
keptElement1andkeptElement2will be retained only when they are children ofparent, not otherwise. (b) There is no guarantee that the elements in the input would in fact be bound to the element declarations in question: if one of their ancestors is locally invalid, things get complicated fast both in XSD 1.0 and in 1.1. (c) We don’t allow for starting validation from a named type definition; we could, but it doesn’t sound as if that’s what you’re interested in. (d) We don’t allow for starting validation from local element or attribute declarations.With those assumptions explicit, we can turn to your problem.
The first task requires that you make a list of (a) all the elements and attributes with top-level declarations in your schema, and (b) all the elements and attributes reachable from them. For top-level declarations, all we need to record is the kind of object (element or attribute) and the expanded name. For local objects, we need the kind of object and the full path from a top-level element declaration. For your sample schema, list (a) consists of
(I am using the convention of writing expanded names with the namespace name in braces; some call this Clark notation, for James Clark.)
List (b) consists of
In more complicated schemas, there will be a certain amount of bookkeeping as you go through the process of generating this list.
Your second task is to write an XSLT stylesheet that keeps the elements and attributes in the list and drops the rest. (I’m assuming here that when you drop an element, you drop all its contents, too; your question talks about elements, not tags.)
For each element in the list, write an appropriate identity transform, using the context given in the list:
You can write a separate template for each element, or you can write several elements into the match pattern:
For each attribute in the list, do the same:
Override the default templates for elements and attributes, to suppress all other elements and attributes:
[Alternatively, as suggested by DrMacro, you can write a function or named template in XSLT to consult the list you generated in task 1, instead of writing it out into repetitive templates with explicit match patterns. Depending on your background, you may find that that approach makes it easier, or harder, to understand what the stylesheet is doing.]