I am trying to digest content from an AP Webfeed, but for some reason what should be a very simple for-each loop is giving me fits. Feed is xml utf-8 coming from AP, using php xsltprocessor and simplexml.
The issue is that I cannot target the correct node I wish to loop on. The feed itself is the root element which has some properties of the feed, and then several ‘entry’ children articles. Each one of those has children properties of the entry (like copyright) and then the actual nitf content (lead and body)
Seems like I should be able to just do <xsl:for-each select="feed/entry" /> but if I attempt to refer to ‘feed’ or ‘entry’ by name I get nothing. I can’t even do <xsl:value-of select="feed/id" /> – oddly I can get //nitf@version to return properly, but can not get it through feed/entry/content/nitf/@version
I am able to address some content with <xsl:for-each select="//nitf"> to get the article body or any descendants of the nitf node but not higher elements (like //entry). The only way I can get to the content closer to root is by nesting <xsl:for-each="/*" /> starting with the root (feed) and drilling down – which just seems wrong.
If anyone can point me in the right direction, I’d REALLY appreciate it, been frustrating me that something seemingly so simple has me stuck for a while now.
Format is:
<feed>
<id></id>
<published></published>
<entry>
<copyright></copyright>
<content>
<nitf>
<head></head>
<body></body>
</nitf>
</content>
</entry>
<entry>
<content>
<nitf>
<head></head>
<body></body>
</nitf>
</content>
</entry>
</feed>
<?xml version="1.0" ?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:template match="/">
<!-- this does loop through nitf -->
<xsl:for-each select="descendant::*/nitf">
<nitf_title></nitf_title>
</xsl:for-each>
<!-- I want to loop on these instead but this never loops -->
<xsl:for-each select="descendant::*/entry">
<entry_title><entry_title>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
Sorry I was trying to keep it short so I mocked up the source feed, actual example below
<?xml version="1.0" encoding="utf-8" ?>
<feed xmlns="http://www.w3.org/2005/Atom" xmlns:apcm="http://ap.org/schemas/03/2005/apcm" xmlns:apnm="http://ap.org/schemas/03/2005/apnm" xmlns:georss="http://www.georss.org/georss">
<id>urn:publicid:ap.org:31998</id>
<title type="xhtml">
<apxh:div xmlns:apxh="http://www.w3.org/1999/xhtml">
<apxh:span>AP Online National News</apxh:span>
</apxh:div>
</title>
<apcm:Property Name="FeedProperties">
<apcm:Property Name="Entitlement" Id="urn:publicid:ap.org:product:31998" Value="AP Online National News" />
<apcm:Property Name="FeedSequencing">
<apcm:Property Name="sequenceNumber" Id="111835329" />
<apcm:Property Name="minDateTime" Value="2011-06-20T16:56:08.047Z" />
</apcm:Property>
</apcm:Property>
<updated>2011-06-20T16:56:08.047Z</updated>
<author>
<name>The Associated Press</name>
<uri>http://www.ap.org</uri>
</author>
<rights></rights>
<link rel="self" href="http://syndication.ap.org" />
<entry xmlns="http://www.w3.org/2005/Atom">
<id>urn:publicid:ap.org:badf779c9d5246b5acb21430ed2214fb</id>
<title>APFN-US--Gas Drilling-Chemicals</title>
<updated>2011-06-20T16:56:08.047Z</updated>
<published>2011-06-20T16:25:39Z</published>
<author>
<name>AP</name>
</author>
<rights>Copyright 2011</rights>
<content type="text/xml">
<nitf version="-//IPTC//DTD NITF 3.4//EN" change.date="October 18, 2006" change.time="19:30" xmlns="">
<head>
<docdata>
<doc-id regsrc="AP" />
<date.issue norm="20110620T162539Z" />
<ed-msg info="Eds: APNewsNow." />
<doc.rights owner="http://www.ap.org" agent="http://license.icopyright.net" type="none" />
<doc.copyright holder="AP" year="2011" />
</docdata>
</head>
<body>
<body.head>
<hedline>
<hl1 id="headline">Texas becomes 1st to require fracking disclosure</hl1>
<hl2 id="originalHeadline">Texas becomes 1st to require fracking disclosure</hl2>
</hedline>
<distributor>The Associated Press</distributor>
<dateline>
<location>HOUSTON</location>
</dateline>
</body.head>
<body.content>
<block id="Main">
<p>HOUSTON (AP) — Texas </p>
</block>
</body.content>
<body.end />
</body>
</nitf>
</content>
<apcm:ContentMetadata xmlns:apcm="http://ap.org/schemas/03/2005/apcm">
<apcm:DateLineLocation City="Houston" Country="USA" CountryArea="TX" CountryAreaName="Texas" CountryName="United States" />
<apcm:Priority Numeric="4" Legacy="r" />
<apcm:ConsumerReady>TRUE</apcm:ConsumerReady>
<apcm:DateLine>HOUSTON</apcm:DateLine>
</apcm:ContentMetadata>
</entry>
<entry xmlns="http://www.w3.org/2005/Atom">
<id>urn:publicid:ap.org:57582781c3a841a2b9849231a4abdb63</id>
<title>US--Medicare-Prevention</title>
<updated>2011-06-20T16:54:57.963Z</updated>
<published>2011-06-20T16:54:43Z</published>
...
As pointed out in the comments above, your issue is related to the namespaces in the source document. For example, you’re trying to match on an element named “entry” but the actual element has a qualified name of {http://www.w3.org/2005/Atom}:entry.
You should rewrite your xpath to include the namespace qualifier using a prefix and then map that prefix to the appropriate value. As a result, “entry” becomes “atom:entry” and some enclosing element has the declaration of atom as xmlns:atom=”http://www.w3.org/2005/Atom”.