I would like to parse a document using SAX, and create a subdocument from some of the elements, while processing others purely with SAX. So, given this document:
<DOC> <small> <element /> </small> <entries> <!-- thousands here --> </entries> </DOC>
I would like to parse the DOC and DOC/entries elements using the SAX ContentHandler, but when I hit <small> I want to create a new document containing just the <small> and its children.
Is there an easy way to do this, or do I have to build the DOM myself, by hand?
One approach is to create a
ContentHandlerthat watches for events that signal the entry or exit from a<small>element. This handler acts as a proxy, and in ‘normal’ mode passes the SAX events straight through to the ‘real’ContentHandler.However, when entry into a
<small>element is detected, the proxy is responsible for the creation of aTransformerHandler(with a no-op, ‘null’ transform), plumbed up to aDOMResult. TheTransformerHandlerexpects all the events that a complete, well-formed document would produce; you cannot immediately send it astartElementevent. Instead, simulate the beginning of a new document by invokingsetDocumentLocator,startDocument, and other necessary events on theTransformerHandlerinstance first.Then, until the end of the
<small>element element is detected by the proxy, all events are forwarded to thisTransformerHandlerinstead of the ‘real’ContentHandler. When the closing</small>tag is encountered, the proxy simulates the end of a document by invokingendDocumenton theTransformerHandler. A DOM is now available as the result of theTransformerHandler, which contains only the<small />fragment.