So I have code that looks like this:
content_url = 'http://auburn.craigslist.org/cpg/index.rss'
doc = Nokogiri::XML(open(content_url))
bq = doc.xpath('//item')
But it returns bq as empty.
I know for sure that it has that tag though, this is the first few tags on that page:
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="http://purl.org/rss/1.0/" xmlns:ev="http://purl.org/rss/1.0/modules/event/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:syn="http://purl.org/rss/1.0/modules/syndication/" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:admin="http://webns.net/mvcb/">
<channel rdf:about="http://auburn.craigslist.org/cpg/index.rss">...</channel>
<item rdf:about="http://auburn.craigslist.org/cpg/3012277218.html">...</item>
Thoughts?
Since item is not in the default namespace, you need to tell XPath under what namespace to look under.
First off, your namespace is what the
xmlnsattribute is set to. For Craigslist, it appears to behttp://purl.org/rss/1.0/. So that would be the namespace you have to tell XPath that you want to use.When calling XPath though, we have to specify what the extra namespaces that we want to use are. Like so.
That is not it though, we need to tell XPath that item is under the
rdfnamespace. We can do this by prefixing the tag name with the namespace. Like so.