I’m way new to working with XML but just had a need dropped in my lap. I have been given an usual (to me) XML format. There are colons within the tags.
<THING1:things type="Container">
<PART1:Id type="Property">1234</PART1:Id>
<PART1:Name type="Property">The Name</PART1:Name>
</THING1:things>
It is a large file and there is much more to it than this but I hope this format will be familiar to someone. Does anyone know a way to approach an XML document of this sort?
I’d rather not just write a brute-force way of parsing the text but I can’t seem to make any headway with REXML or Hpricot and I suspect it is due to these unusual tags.
my ruby code:
require 'hpricot'
xml = File.open( "myfile.xml" )
doc = Hpricot::XML( xml )
(doc/:things).each do |thg|
[ 'Id', 'Name' ].each do |el|
puts "#{el}: #{thg.at(el).innerHTML}"
end
end
…which is just lifted from: http://railstips.org/blog/archives/2006/12/09/parsing-xml-with-hpricot/
And I figured I would be able to figure some stuff out from here but this code returns nothing. It doens’t error. It just returns.
As @pguardiario mentioned, Nokogiri is the de facto XML and HTML parsing library. If you wanted to print out the
IdandNamevalues in your example, here is how you would do it:A few notes:
at_xpathis for matching one thing. If you know you have multiple items, you want to usexpathinstead.doc.remove_namespaces!can help (see this answer for a brief discussion).cssmethods instead ofxpathif you’re more comfortable with those.irborpryto investigate methods.Resources
Update
To handle multiple items, you need a root element, and you need to remove the
//in thexpathquery.This will give you:
If you are more familiar with CSS selectors, you can use this nearly identical bit of code: