I am running some XSL transforms via Ant’s XSLT task. I am using Saxon 9HE as the processing engine. I have a situation where the input XML files all use the same DTD but declare it to be in different places. Some declare it to be in the current directory, some in a folder and others reference a URL. Here is the Ant script:
<?xml version="1.0" encoding="UTF-8"?>
<project name="PubXML2EHeader" default="transform">
<property name="data.dir.input" value="./InputXML"/>
<property name="data.dir.output" value="./converted-xml"/>
<property name="xslt.processor.location" value="D:\\saxon9he.jar"/>
<property name="xslt.processor.factory" value="net.sf.saxon.TransformerFactoryImpl"/>
<path id="saxon9.classpath" location="${xslt.processor.location}"/>
<target name="clean">
<delete dir="${data.dir.output}" includes="*.xml" failonerror="no"/>
</target>
<target name="transform" depends="clean">
<xslt destdir="${data.dir.output}"
extension=".xml"
failOnTransformationError="false"
processor="trax"
style="Transform.xsl"
useImplicitFileset="false"
classpathref="saxon9.classpath"
>
<outputproperty name="method" value="xml"/>
<outputproperty name="indent" value="yes"/>
<fileset dir="${data.dir.input}" includes="**/*.xml" excludes="Transform.xml"/>
<factory name="${xslt.processor.factory}"/>
</xslt>
</target>
</project>
When I run this Ant script I get errors like this:
[xslt] : Fatal Error! I/O error reported by XML parser processing
file:/D:/annurev.biophys.093008.131228.xml:
http://www.atypon.com/DTD/nlm-dtd/archivearticle.dtd Cause:
java.io.FileNotFoundException:
http://www.atypon.com/DTD/nlm-dtd/archivearticle.dtd
I think these are caused by the fact that Saxon cannot get to the DTD (which is a actually a firewall issue in this case). I don’t think I care about validating the input, which is what I think is happening here, and I would like to skip it. Is there an attribute I can add to the XSLT Ant task to stop Saxon from trying to read in the DTD?
You are confusing “reading the DTD” with validating. An XSLT processor will always ask the parser to read the external DTD of a document whether it is validating or not. This is because a DTD is used for more than validation; it is also used for expansion of entity references.
The usual way to deal with this problem is to redirect the DTD reference to a copy that is somewhere it can be accessed, generally by use of catalogs. This involves setting an EntityResolver on the underlying XML parser.
There’s lots of information on the web about how to set up a catalog resolver with Saxon, usually from the command line: see for example here: http://www.sagehill.net/docbookxsl/UseCatalog.html
The advice is generally to set the -x, -y, and -r options, but in fact only -x is relevant if you only need to redirect DTD references in source documents (-y affects stylesheets, -r affects the document() function). In Ant, the equivalent to setting the -x option is to use the attribute child of the factory element to set the configuration property
<attribute name="http://saxon.sf.net/feature/sourceParserClass" value="org.apache.xml.resolver.tools.ResolvingXMLReader"/>.That still leaves the part I find tricky, which is actually creating your catalog file.