Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 45467
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 10, 20262026-05-10T15:48:50+00:00 2026-05-10T15:48:50+00:00

I have a 2.4 MB XML file, an export from Microsoft Project (hey I’m

  • 0

I have a 2.4 MB XML file, an export from Microsoft Project (hey I’m the victim here!) from which I am requested to extract certain details for re-presentation. Ignoring the intelligence or otherwise of the request, which library should I try first from a Ruby perspective?

I’m aware of the following (in no particular order):

  • REXML
  • Chilkat Ruby XML library
  • hpricot XML
  • libXML

I’d prefer something packaged as a Ruby gem, which I suspect the Chilkat library is not.

Performance isn’t a major issue – I don’t expect the thing to need to run more than once a day (once a week is more likely). I’m more interested in something that’s as easy to use as anything XML-related is able to get.

EDIT: I tried the gemified ones:

hpricot is, by a country mile, easiest. For example, to extract the content of the SaveVersion tag in this XML (saved in a file called, say ‘test.xml’)

<?xml version='1.0' encoding='UTF-8' standalone='yes'?> <Project xmlns='http://schemas.microsoft.com/project'>     <SaveVersion>12</SaveVersion> </Project> 

takes something like this:

doc = Hpricot.XML(open('test.xml')) version = (doc/:Project/:SaveVersion).first.inner_html 

hpricot seems to be relatively unconcerned with namespaces, which in this example is fine: there’s only one, but would potentially be a problem with a complex document. Since hpricot is also very slow, I rather imagine this would be a problem that solves itself.

libxml-ruby is an order of magnitude faster, understands namespaces (it took me a good couple of hours to figure this out) and is altogether much closer to the XML metal – XPath queries and all the other stuff are in there. This is not necessarily a Good Thing if, like me, you open up an XML document only under conditions of extreme duress. The helper module was mostly helpful in providing examples of how to handle a default namespace effectively. This is roughly what I ended up with (I’m not in any way asserting its beauty, correctness or other value, it’s just where I am right now):

xml_parser = XML::Parser.new xml_parser.string = File.read(path) doc = xml_parser.parse @root = doc.root @scopes = { :in_node => '', :in_root => '/', :in_doc => '//' } @ns_prefix = 'p' @ns = '#{@ns_prefix}:#{@root.namespace[0].href}' version = @root.find_first(xpath_qry('Project/SaveVersion', :in_root), @ns).content.to_i  def xpath_qry(tags, scope = :in_node)   '#{@scopes[scope]}' + tags.split(/\//).collect{ |tag| '#{@ns_prefix}:#{tag}'}.join('/') end 

I’m still debating the pros and cons: libxml for its extra rigour, hpricot for the sheer style of _why’s code.

EDIT again, somewhat later: I discovered HappyMapper (‘gem install happymapper’) which is hugely promising, if still at an early stage. It’s declarative and mostly works, although I have spotted a couple of edge cases that I don’t have fixes for yet. It lets you do stuff like this, which parses my Google Reader OPML:

module OPML   class Outline     include HappyMapper     tag 'outline'     attribute :title, String     attribute :text, String     attribute :type, String     attribute :xmlUrl, String     attribute :htmlUrl, String     has_many :outlines, Outline   end end  xml_string = File.read('google-reader-subscriptions.xml')  sections = OPML::Outline.parse(xml_string) 

I already love it, even though it’s not perfect yet.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. 2026-05-10T15:48:51+00:00Added an answer on May 10, 2026 at 3:48 pm

    Hpricot is probably the best tool for you — it is easy to use and should handle 2mg file with no problem.

    Speedwise libxml should be the best. I used libxml2 binding for python few months ago (at that moment rb-libxml was stale). Streaming interface worked the best for me (LibXML::XML::Reader in ruby gem). It allows to process file while it is downloading, is a bit more userfriendly than SAX and allowed me to load data from 30mb xml file from internet to a MySQL database in a little more than a minute.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Ask A Question

Stats

  • Questions 77k
  • Answers 77k
  • Best Answers 0
  • User 1
  • Popular
  • Answers
  • Editorial Team

    How to approach applying for a job at a company ...

    • 7 Answers
  • Editorial Team

    How to handle personal stress caused by utterly incompetent and ...

    • 5 Answers
  • Editorial Team

    What is a programmer’s life like?

    • 5 Answers
  • added an answer row = ceiling(7 / 5) or ceiling(position / width) May 11, 2026 at 3:34 pm
  • added an answer Since it's Friday night, I decided to play with Power… May 11, 2026 at 3:34 pm
  • added an answer Whichever style has the highest specificity will win. If you… May 11, 2026 at 3:34 pm

Related Questions

I have a 2.4 MB XML file, an export from Microsoft Project (hey I'm
I’m looking for XML Diff class or library. There are my requirements: - open
I would like to be able to get the speed of the wireless interface
I have a listview with 4 columns - Name, Extension, Size and Location. I
Greetings! I'd like to build an apache web server, running on debian lenny. It

Trending Tags

analytics british company computer developers django employee employer english facebook french google interview javascript language life php programmer programs salary

Top Members

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.