Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 9268097
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 18, 20262026-06-18T14:43:19+00:00 2026-06-18T14:43:19+00:00

I have a large local XML file (24 GB) with a structure like this:

  • 0

I have a large local XML file (24 GB) with a structure like this:

<id>****</id>
<url> ****</url> (several times within an id...)

I need a result like this:

id1;url1
id1;url2
id1;url3
id2;url4
....

I wanted to use Nokigiri either with the SAX Parser or the Reader since I can’t load the whole file into memory. I am using a Ruby Rake task to execute the code.

My code with SAX is:

task :fetch_saxxml => :environment do

  require 'nokogiri'
  require 'open-uri'

  class MyDocument < Nokogiri::XML::SAX::Document
    attr_accessor :is_name

    def initialize
      @is_name = false
    end

    def start_element name, attributes = []
      @is_name = name.eql?("id")
    end

    def characters string
      string.strip!
      if @is_name and !string.empty?
        puts "ID: #{string}"
      end
    end

    def end_document
      puts "the document has ended"
    end

  end

  parser = Nokogiri::XML::SAX::Parser.new(MyDocument.new)
  parser.parse_file('/path_to_my_file.xml')

end

That is fine in order to fetch the IDs in the file but I need to fetch the URLs within each id node, too.

How do I put something like “each do” within that code to fetch the URLs and have an output like that shown above? Or is it possible to call several actions within “characters”?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-18T14:43:20+00:00Added an answer on June 18, 2026 at 2:43 pm

    Actually this is a solution to parse several nodes when they occur. The problem with SAX parsers is that you have to find a way to handle special characters like “&” and so on… but that is another story.

    Here is my code:

    class MyDoc < Nokogiri::XML::SAX::Document
      def start_element name, attrs = []
        @inside_content = true if name == 'yourvalue'
        @current_element = name
      end
    
    
      def characters str
    
        if @current_element == 'your_1st subnode'
    
        elsif @current_element == 'your 2nd subnode'
    
    
        end
        puts "#{@current_element} - #{str}" if @inside_content && %w{your_subnodes here}.include?(@current_element)
      end
    
      def end_element name
        @inside_content = false if name == 'yourvalue'
        @current_element = nil
      end
    
    end
    
    parser = Nokogiri::XML::SAX::Parser.new(MyDoc.new)
    parser.parse_file('/path_to_your.xml')
    
    end
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

So I have this pretty large XML file (40MB) that I'll have to repeatedly
I currently have a script written that begins downloading an large (1.3GB) XML file
I have a large querystring like http://terra.cic.local/web/index.cfm//pm/uebersicht?sucheAufgeklappt=was%2Cwie%2Cwohin%2Cwann%2Cwer&sucheVon=&sucheBis=&sucheIstErsteSeiteAnzahlProdukteErmitteln=false&sucheIDReiseart=26&sucheHotelart=1081&sucheHotelart=1082&sucheIDLand=347&sucheRegion=214&sucheIstZeitlichFlexibel=true&sucheDauer=&sucheAnzahlErwachsene=2&sucheAnzahlKinder=0&sucheAnzahlPersonen=2&sucheAnzahlSchlafzimmer=&sucheEntfernungStrand=&sucheEntfernungSkilift= this link will be pasted displayed in
Does anyone have experience with working with large local XML files? Let's say 100.000
I have large matrix, 4000x4000 I need to calculate local average of 11x11 window
I have large video files (~100GB) that are local on my machine. I have
I have large images displayed in a grouped tableview. I would like the images
I have a UITableView with lots of UIButton. Large local images displayed in buttons.
I have a large web application. One very small piece of this web application
I have a large web page file (around 10mb) on a server drive that

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.