Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8002963
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 4, 20262026-06-04T16:31:40+00:00 2026-06-04T16:31:40+00:00

I have a 1.6gb xml file, and when I parse it with Sax Machine

  • 0

I have a 1.6gb xml file, and when I parse it with Sax Machine it does not seem to be streaming or eating the file in chunks – rather it appears to be loading the whole file into memory (or maybe there is a memory leak somewhere?) because my ruby process climbs upwards of 2.5gb of ram. I don’t know where it stops growing because I ran out of memory.

On a smaller file (50mb) it also appears to be loading the whole file. My task iterates over the records in the xml file and saves each record to a database. It takes about 30 seconds of “idling” and then all of a sudden the database queries start executing.

I thought SAX was supposed to allow you to work with large files like this without loading the whole thing in memory.

Is there something I am overlooking?

Many thanks

Update to add code sample

class FeedImporter

  class FeedListing
    include ::SAXMachine

    element :id
    element :title
    element :description
    element :url

    def to_hash
      {}.tap do |hash|
        self.class.column_names.each do |key|
          hash[key] = send(key)
        end
      end
    end
  end

  class Feed
    include ::SAXMachine
    elements :listing, :as => :listings, :class => FeedListing
  end

  def perform
    open('~/feeds/large_feed.xml') do |file|

      # I think that SAXMachine is trying to load All of the listing elements into this one ruby object.
      puts 'Parsing'
      feed = Feed.parse(file)

      # We are now iterating over each of the listing elements, but they have been "parsed" from the feed already.
      puts 'Importing'
      feed.listings.each do |listing|
        Listing.import(listing.to_hash)
      end

    end
  end

end

As you can see, I don’t care about the <listings> element in the feed. I just want the attributes of each <listing> element.

The output looks like this:

Parsing
... wait forever
Importing (actually, I don't ever see this on the big file (1.6gb) because too much memory is used :(
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-04T16:31:42+00:00Added an answer on June 4, 2026 at 4:31 pm

    I forked sax-machine so that it uses constant memory: https://github.com/gregwebs/sax-machine

    Good news: there is a new maintainer that is planning on merging my changes.
    Myself and the new maintainer have been using my fork without issue for a year now.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a Vista x64 machine with 6GB of RAM, and I'm attempting to
I have one huge file (over 6GB) and about 1000 patterns. I want extract
In Linux, I have a rather large file with some extraneous information tacked on
Have a SomeLib.pro file that contains: CONFIG += debug TEMPLATE = lib TARGET =
have not tested on windows. but in ubuntu when u disconnect from the network,
I have a database which is 6GB in size, with a multitude of tables
We use SourceSafe 6.0d and have a DB that is about 1.6GB. We haven't
I have read several threads about memory issues in R and I can't seem
I have the following script: mydata <- read.csv(file=priceData.txt, head=TRUE, sep='\t') plot(mydata$Date, mydata$Price) mydata$Date and
I have a database of about 6GB in size and it has a table

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.