Hi I want to work around a ‘bug’ in certain RSS-feeds, which use an incorrect namespace for the mediaRSS module. I tried to do it by manipulating the DOM programmatically, but using XSLT seems more flexible to me.
Example:
<media:thumbnail xmlns:media="http://search.yahoo.com/mrss" url="http://www.suedkurier.de/storage/pic/dpa/infoline/brennpunkte/4311018_0_merkelxI_24280028_original.large-4-3-800-199-0-3131-2202.jpg" />
<media:thumbnail url="http://www.suedkurier.de/storage/pic/dpa/infoline/brennpunkte/4311018_0_merkelxI_24280028_original.large-4-3-800-199-0-3131-2202.jpg" />
Where the namespace must be http://search.yahoo.com/mrss/ (mind the slash).
This is my stylesheet:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="//*[namespace-uri()='http://search.yahoo.com/mrss']">
<xsl:element name="{local-name()}" namespace="http://search.yahoo.com/mrss/" >
<xsl:apply-templates select="@*|*|text()" />
</xsl:element>
</xsl:template>
</xsl:stylesheet>
Unfortunately the result of the transformation is an invalid XML and my RSS-Parser (ROME Library) does not parse the feed anymore:
java.lang.IllegalStateException: Root element not set
at org.jdom.Document.getRootElement(Document.java:218)
at com.sun.syndication.io.impl.RSS090Parser.isMyType(RSS090Parser.java:58)
at com.sun.syndication.io.impl.FeedParsers.getParserFor(FeedParsers.java:72)
at com.sun.syndication.io.WireFeedInput.build(WireFeedInput.java:273)
at com.sun.syndication.io.WireFeedInput.build(WireFeedInput.java:251)
... 8 more
What is wrong with my stylesheet?
You have half of the solution in your stylesheet.
You have put in a template to match (and correct) the elements with the wrong Media RSS namespace, but you don’t have anything to match the other elements/attributes in the RSS feed.
The built-in template rules are matching the rest of the document nodes, which will only copy the text nodes into the output. That does not preserve the original RSS feed’s XML and produces output that is not valid RSS XML structure.
Adding an identity transform template will ensure that the other nodes and attributes get copied into the output and will preserve the document content/structure.