I’m trying to clean some htmls. I have converted them to xhtml with tidy
$ tidy -asxml -i -w 150 -o o.xml index.html
The resulting xhtml ends up having named entities.
When trying xsltproc on those xhtmls, I keep getting errors.
$ xsltproc --novalid -o out.htm t.xsl o.xml
o.xml:873: parser error : Entity 'mdash' not defined
resources to storing data and using permissions — as needed.</
^
o.xml:914: parser error : Entity 'uarr' not defined
</div><a href="index.html#top" style="float:right">↑ Go to top</a>
^
o.xml:924: parser error : Entity 'nbsp' not defined
Android 3.2 r1 - 27 Jul 2011 12:18
If I add –html to the xsltproc it complains on a tag that has name and id attributes with same name (which is valid)
$ xsltproc --novalid --html -o out.htm t.xsl o.xml o.xml:845: element a: validity error : ID top already defined
<a name="top" id="top"></a>
^
The xslt is simple:
<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="html" indent="yes" omit-xml-declaration="yes"/>
<xsl:template match="node()|@*">
<xsl:copy>
<xsl:apply-templates select="node()|@*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="//*[@id=side-nav]"/>
</xsl:stylesheet>
Why doesn’t –html work? Why is it complaining? Or should I forget it and fix the entities?
I am assuming that the unclearly stated question is this: I know how to avoid “Entity ‘XXX’ not defined” errors when running xsltproc (add
--html). But how do I get rid of “ID YYY already defined”?Recent builds of Tidy have an anchor-as-name option. You can set it to “no” to remove unwanted
nameattributes: