I try to transform XHTML webpage using XSLT by extracting some of its parts. For example, I’d like to extract HEAD and BODY parts separately (it’s only first step, next will be extracting some divs) and use them in my output XHTML document. Here is XSLT code:
<xsl:stylesheet version="2.0"
xmlns:xhtml="http://www.w3.org/1999/xhtml"
xmlns="http://www.w3.org/1999/xhtml"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
exclude-result-prefixes="xhtml xsl xs">
<xsl:output
method="html"
omit-xml-declaration="yes"
doctype-public="-//W3C//DTD XHTML 1.0 Transitional//EN"
doctype-system="http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"
indent="yes"/>
<xsl:template match="/">
<HTML>
<xsl:apply-templates/>
</HTML>
</xsl:template>
<xsl:template match="xhtml:HTML/xhtml:BODY">
<xsl:copy-of select="." disable-output-escaping="yes" />
</xsl:template>
<xsl:template match="xhtml:HTML/xhtml:HEAD">
<xsl:copy-of select="." disable-output-escaping="yes"/>
</xsl:template>
</xsl:stylesheet>
As an input XHTML I have http://www.wordpress.org/about source code (validating).
As first neko purifier is fired (HTML->XHTML) and then my xslt transformation. When I take a look into output code everything looks similar:
Original code: codepad.org/5D7MCXSk
Code after transformation: http://codepad.org/fGzyAwF2
Except, when I open it in web browser I get “white wall” – nothing appears. I noticed that in source code of transformed site (both on chrome and firefox) syntax is highlighted up to the closing HEAD tag. It is very weird and I thing that it is causing the problem.
Any help will be very appreciated.
Thanks in advance
So it seems that http://codepad.org/5D7MCXSk (code 1) is the same as the source code of http://wordpress.org/about/ (code 2) and you process this code with “neko purifier” (is it this one: http://nekohtml.sourceforge.net/ ?) resulting the document in http://codepad.org/fGzyAwF2 (code 3). Correct me if I’m wrong.
The reason why code 3 doesn’t show anything in the browser seems to be a self closing
<SCRIPT/>at the end of the<HEAD>. YMMW, but in my tests for some reason the browsers didn’t seem to like it.Your XSLT code is slightly flawed but if you feed the code 3 as input, it produces an output. The quirk of the input file, that self closing script element, is preserved in the transformation.
Some random notes:
<xsl:copy-of>doesn’t have attributedisable-output-escapingmethod="html"because html doesn’t use namespaces (unlike xhtml)