Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8643753
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 12, 20262026-06-12T12:07:43+00:00 2026-06-12T12:07:43+00:00

This is my first attempt parsing a webpage using Nokogiri. I am trying to

  • 0

This is my first attempt parsing a webpage using Nokogiri.

I am trying to extract the addresses from a webpage and store them in a CSV file. So far, I’ve only been able to extract the City, State, and Zip fields.

I don’t know how to extract the facility name, address, phone, numbers, and company information. The address may contain one or two street components.

For the phone, there may be one or more phone numbers. The phone numbers may be regular numbers or fax numbers, but they are only indicated in the text as opposed to a tag. For the company, I’d like to be able to extract the URL and the name.

Each address on the page is enclosed as follows:

  <!-- address entry -->

  <div id='1234' class='address'> 

    <div class='address_header'> 
      <h1 class='header_name'>
        <strong><a href='{URL}'>Facility Name</a></strong>
      </h1>
      <h2 class='header_city'>
        New York
      </h2>
    </div> 

    <div class='address_details'> 
      <div class='info'> 
        <p class='address'>
      <span class='street'>123 ABC St</span><br />
      <span class='street'>Unit 1</span><br />
      <span class='city'>New York</span>, 
          <span class='state'>NY</span> 
          <span class='zip'>10022</span>
        </p>
        <p class='phone'>
          Phone: <span class='tel'>999.999.9999</span>
        </p>
        <p class='phone'>
          Fax: <span class='tel'>888.888.8888</span>
        </p>
        <p class='company'>
          Company: <a href='{URL}'>Company Name</a>
        </p>
      </div>  
    </div> 
  </div>  
  <!-- address entry -->

  <!-- address entry -->

  <div id='4567' class='address'> 

    <div class='address_header'> 
      <h1 class='header_name'>
        <strong><a href='{URL}'>Facility Name</a></strong>
      </h1>
      <h2 class='header_city'>
        New York
      </h2>
    </div> 

    <div class='address_details'> 
      <div class='info'> 
        <p class='address'>
      <span class='street'>456 DEF Rd</span><br />
      <span class='city'>New York</span>, 
          <span class='state'>NY</span> 
          <span class='zip'>10022</span>
        </p>
        <p class='phone'>
          Phone: <span class='tel'>555.555.5555</span>
        </p>
        <p class='company'>
          Company: <a href='{URL}'>Company Name</a>
        </p>
      </div>  
    </div> 
  </div>  
  <!-- address entry -->

Here’s my very basic set-up.

require 'nokogiri'
require 'open-uri'
require 'csv'

doc = Nokogiri::HTML(open('[URL]'))

Cities = Array.new
States = Array.new
Zips = Array.new

doc.css("p[class='address']").css("span[class='city']").each do |city|
  Cities << city.content
end

doc.css("p[class='address']").css("span[class='state']").each do |state|
    States << state.content
end

doc.css("p[class='address']").css("span[class='zip']").each do |zip|
    Zips << zip.content
end

CSV.open("myCSV.csv", "wb") do |row|
    row << ["City", "State", "Zip"]
    (0..Cities.length - 1).each do |index|
    row << [Cities[index], States[index], Zips[index]]
  end
end

Storing the information in separate arrays here seems very clunky. I’d basically like to make a row entry in a CSV table for each occurrence of the address node in the source document, and then populate it with fields if they exist:

Facility  St_1  St_2  City  State  Zip  Phone  Fax  URL  Company
========  ===== ===== ===== ====== ==== ====== ==== ==== ============
xxxxxxxx  xxxx        xxxx  xxxxx  xxxx xxxxx       xxxx xxxxxxxx
xxxxxxxx  xxxx  xxxxx xxxx  xxxxx  xxxx xxxxx  xxxx xxxx xxxxxxxx

Can someone help me?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-12T12:07:45+00:00Added an answer on June 12, 2026 at 12:07 pm

    You probably have some edge cases that this won’t handle, but this takes care of your example. You’ll need to change the doc to read from the real page instead of the data segment, and you’ll need to change the csv to print to a file instead of display inline like I’ve done.

    require 'nokogiri'
    require 'open-uri'
    require 'csv'
    
    doc = Nokogiri::HTML(DATA.read)
    
    CompanyInfo   = Struct.new :facility, :street1, :street2, :city, :state, :zip, :phone, :fax, :url, :company
    company_infos = []
    
    doc.css("div.address").each do |address_div|
      facility         = address_div.at_css('.address_header .header_name').text.strip
      info             = address_div.css('div.address_details .info')
      street1, street2 = info.css('.street').map(&:text)
      city             = info.at_css('.city').text
      state            = info.at_css('.state').text
      zip              = info.at_css('.zip').text
      phone, fax       = info.css('.phone .tel').map(&:text)
      url              = info.at_css('.company a')['href']
      company          = info.at_css('.company a').text
    
      company_infos << CompanyInfo.new(facility, street1, street2, city, state, zip, phone, fax, url, company)
    end
    
    csv = CSV.generate do |csv|
      csv << %w[Facility Street1 Street2 City State Zip Phone Fax URL Company]
      company_infos.each do |company_info|
        csv << company_info.to_a
      end
    end
    
    csv # => "Facility,Street1,Street2,City,State,Zip,Phone,Fax,URL,Company\nFacility Name,123 ABC St,Unit 1,New York,NY,10022,999.999.9999,888.888.8888,{URL},Company Name\n"
    
    
    __END__
    <!-- address entry -->
    
    <div id='1234' class='address'> 
    
      <div class='address_header'> 
        <h1 class='header_name'>
          <strong><a href='{URL}'>Facility Name</a></strong>
        </h1>
        <h2 class='header_city'>
          New York
        </h2>
      </div> 
    
      <div class='address_details'> 
        <div class='info'> 
          <p class='address'>
            <span class='street'>123 ABC St</span><br />
            <span class='street'>Unit 1</span><br />
            <span class='city'>New York</span>, 
            <span class='state'>NY</span> 
            <span class='zip'>10022</span>
          </p>
          <p class='phone'>
            Phone: <span class='tel'>999.999.9999</span>
          </p>
          <p class='phone'>
            Fax: <span class='tel'>888.888.8888</span>
          </p>
          <p class='company'>
            Company: <a href='{URL}'>Company Name</a>
          </p>
        </div>  
      </div> 
    </div>
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

this if my first attempt at using streaming for WCF, and I am struggling
I'm trying to create a simple threading procedure (granted this is my first attempt
This is my first attempt at using std::future . I have three different files
First, let me just mention that this is my first attempt at a from-the-ground-up
I'm trying to extract text from a large number of PDFs using PDFMiner python
This is my first attempt at responsive design, so I'm keeping it simple. I
This is my first attempt to write shorthand if statements however am befuddled by
This is my first attempt to create a GUI in MATLAB. I haven't been
this is my first attempt at a responsive design so excuse me if this
Because this is my first attempt at an extension method that seems quite useful

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.