Reference page The XML is embedded under the <pre> tag of the returned HTML

Question

0

Asked: June 16, 20262026-06-16T18:53:08+00:00 2026-06-16T18:53:08+00:00

Reference page The XML is embedded under the <pre> tag of the returned HTML

0

The XML is embedded under the <pre> tag of the returned HTML page.
I can extract the contents of the <pre> tag, but I am unable to convert this to XML correctly.
I tried using the to_xml method of the NodeSet class, but it seems that the line endings (\n) are messing up the parsing.

Here is a snippet of my code:

url = "http://www.ncbi.nlm.nih.gov/pubmed/?term=NS044283[GR]&dispmax=200&report=xml"
doc = Nokogiri::XML(open(url))
pre = doc.xpath('//pre')
xml = pre.to_xml
contents = Nokogiri::XML(xml)
articles = contents.xpath('\\PubmedArticle')
(article = [])

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-16T18:53:09+00:00

Since you’re going to use Nokogiri to parse it anyway, just call content instead of to_xml:

require 'nokogiri'
require 'open-uri'
url = "http://www.ncbi.nlm.nih.gov/pubmed/?term=NS044283[GR]&dispmax=200&report=xml"
doc = Nokogiri::XML(open(url))
pre = doc.xpath('//pre')
xml = "<root>" + pre.text + "</root>"
contents = Nokogiri::XML(xml)
articles = contents.css('PubmedArticle')
puts contents.css('ArticleTitle').map{|x| x.content}.count   
=> 25

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Reference page The XML is embedded under the <pre> tag of the returned HTML

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply