Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 3784078
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 19, 20262026-05-19T11:19:20+00:00 2026-05-19T11:19:20+00:00

I am fairly new to Ruby and the world of programming so please, bear

  • 0

I am fairly new to Ruby and the world of programming so please, bear with me.

My goal is to scrape a table and then save the data to an XML file. The simple script that I’ve written successfully accomplishes both things. The problem I am having is the way the XML is being saved. It doesn’t match the XML that I am used to seeing.

I’ve rummaged through quite a bit of examples, tutorials and forums but have yet to arrive at a solution.

I am open to any suggestions on a better way to get the data from the table as well, especially since the first three columns are all I really need. HELP!!!

Here is my script:

require 'nokogiri'
require 'open-uri'

url = "http://www.covers.com/pageLoader/pageLoader.aspx?page=
/data/nba/team/pastresults/2010-2011/team404085.html"
doc = Nokogiri::HTML(open(url))

builder = Nokogiri::XML::Builder.new do |xml|
  xml.root {
    xml.items {
       doc.css('.data').each do |o|
        xml.item_content = o
       end
    }
  }
end

File.open('ATL.xml','w'){|f| f.write builder.to_xml}

puts "Scrape Completed."  

Whether it’s saved to an .xml file or printed on the screen in Ruby, the XML looks like this:

<?xml version="1.0"?>
<root>
  <items>
    <item_content=>&lt;table cellpadding="2" cellspacing="1" class="data"&gt;
&lt;tr class="datahead"&gt;
&lt;td width="11%"&gt;Date&lt;/td&gt;&#xD;
    &lt;td width="21%"&gt;Vs&lt;/td&gt;&#xD;
    &lt;td width="18%"&gt;Score&lt;/td&gt;&#xD;
    &lt;td width="27%"&gt;Type&lt;/td&gt;&#xD;
    &lt;td width="13%"&gt;ATL Line&lt;/td&gt;&#xD;
    &lt;td width="10%"&gt;O/U&lt;/td&gt;&#xD;
  &lt;/tr&gt;
&lt;tr class="datarow"&gt;
&lt;td&gt;&#xD;
        01/18/11&lt;/td&gt;&#xD;
      &lt;td&gt;&#xD;
        @ &lt;a href="/pageLoader/pageLoader.aspx?page=/data/nba/team/
team404171.html"&gt;Miami&lt;/a&gt;&#xD;
        &lt;/td&gt;&#xD;
      &lt;td&gt;&#xD;
        W &lt;a href="/pageLoader/pageLoader.aspx?page=/data/nba/
results/2010-2011/boxscore795345.html"&gt;&#xD;
        93-89&lt;/a&gt; (OT)&lt;/td&gt;&#xD;
      &lt;td&gt;&#xD;
        Regular Season&lt;/td&gt;&#xD;
      &lt;td&gt;&#xD;
        W 5.5&lt;/td&gt;&#xD;
      &lt;td&gt;&#xD;
        U 194&lt;/td&gt;&#xD;
    &lt;/tr&gt;

The above code is just a snippet as there are multiple rows. (44 Total)
What is the best way to go about doing this?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-19T11:19:21+00:00Added an answer on May 19, 2026 at 11:19 am

    It’s not clear what you want as your output; do you want the HTML from the original included in the XML, or just the contents of the HTML? In the future, it is helpful when you include an example of what you wanted along with an example of the problem. Let us solve both problems. First, we can reproduce your problem more simply like so:

    require 'nokogiri'
    doc = Nokogiri::XML <<ENDXML
      <root>
        <p class="foo">42</p>
        <p class="bar">99</p>
        <p class="foo">17</p>
      </root>
    ENDXML
    
    builder = Nokogiri::XML::Builder.new do |xml|
      xml.items {
        doc.css('.foo').each{ |o| xml.item_content = o }
      }
    end    
    puts builder.to_xml
    #=> <?xml version="1.0"?>
    #=> <items>
    #=>   <item_content=>&lt;p class="foo"&gt;42&lt;/p&gt;</item_content=>
    #=>   <item_content=>&lt;p class="foo"&gt;17&lt;/p&gt;</item_content=>
    #=> </items>
    

    If you wanted the contents of your HTML nodes only in the XML, and presuming you didn’t want the equals sign to be part of the tag name, then:

    builder = Nokogiri::XML::Builder.new do |xml|
      xml.items {
        doc.css('.foo').each{ |o| xml.item_content( o.text ) }
      }
    end
    puts builder.to_xml
    #=> <?xml version="1.0"?>
    #=> <items>
    #=>   <item_content>42</item_content>
    #=>   <item_content>17</item_content>
    #=> </items>
    

    If, on the other hand, you did want the raw HTML in your XML, but didn’t want all the entities, then make it a CDATA block:

    builder = Nokogiri::XML::Builder.new do |xml|
      xml.items {
        doc.css('.foo').each{ |o| xml.item_content{ xml.cdata o } }
      }
    end
    puts builder.to_xml
    #=> <?xml version="1.0"?>
    #=> <items>
    #=>   <item_content><![CDATA[<p class="foo">42</p>]]></item_content>
    #=>   <item_content><![CDATA[<p class="foo">17</p>]]></item_content>
    #=> </items>
    

    An XML CDATA block allows you to use characters normally reserved for XML markup without needing to express them as character entities.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I'm fairly new to the world of versioning but would like to introduce Subversion
I'm fairly new at programming, but I've wondered how shell text editors such as
I'm fairly new to Ruby on Rails, and I have a project with a
I'm fairly new to ruby, and am configuring IRB. I like pretty print (require
I'm fairly new to Ruby on Rails, and I'm attempting to create some fancy
I'm fairly new in the Ruby + Rails scene. Although I have a very
Being fairly new to JavaScript, I'm unable to discern when to use each of
I'm still fairly new to T-SQL and SQL 2005. I need to import a
I'm fairly new to ASP.NET and trying to learn how things are done. I
I'm fairly new to the STL, so I was wondering whether there are any

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.