Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6927425
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 27, 20262026-05-27T11:03:16+00:00 2026-05-27T11:03:16+00:00

I am trying to understand Nokogiri. Does anyone have a link to a basic

  • 0

I am trying to understand Nokogiri. Does anyone have a link to a basic example of Nokogiri parse/scrape showing the resultant tree. Think it would really help my understanding.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-27T11:03:16+00:00Added an answer on May 27, 2026 at 11:03 am

    Using IRB and Ruby 1.9.2:

    Load Nokogiri:

    > require 'nokogiri'
    #=> true
    

    Parse a document:

    > doc = Nokogiri::HTML('<html><body><p>foobar</p></body></html>')
    #=> #<Nokogiri::HTML::Document:0x1012821a0
          @node_cache = [],
          attr_accessor :errors = [],
          attr_reader :decorators = nil
    

    Nokogiri likes well formed docs. Note that it added the DOCTYPE because I parsed as a document. It’s possible to parse as a document fragment too, but that is pretty specialized.

    > doc.to_html
    #=> "<!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.0 Transitional//EN\" \"http://www.w3.org/TR/REC-html40/loose.dtd\">\n<html><body><p>foobar</p></body></html>\n"
    

    Search the document to find the first <p> node using CSS and grab its content:

    > doc.at('p').text
    #=> "foobar"
    

    Use a different method name to do the same thing:

    > doc.at('p').content
    #=> "foobar"
    

    Search the document for all <p> nodes inside the <body> tag, and grab the content of the first one. search returns a nodeset, which is like an array of nodes.

    > doc.search('body p').first.text
    #=> "foobar"
    

    This is an important point, and one that trips up almost everyone when first using Nokogiri. search and its css and xpath variants return a NodeSet. NodeSet.text or content concatenates the text of all the returned nodes into a single String which can make it very difficult to take apart again.

    Using a little different HTML helps illustrate this:

    > doc = Nokogiri::HTML('<html><body><p>foo</p><p>bar</p></body></html>')
    > puts doc.to_html
    <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
    <html><body>
    <p>foo</p>
    <p>bar</p>
    </body></html>
    
    > doc.search('p').text
    #=> "foobar"
    
    > doc.search('p').map(&:text)
    #=> ["foo", "bar"]
    

    Returning back to the original HTML…

    Change the content of the node:

    > doc.at('p').content = 'bar'
    #=> "bar"
    

    Emit a parsed document as HTML:

    > doc.to_html
    #=> "<!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.0 Transitional//EN\" \"http://www.w3.org/TR/REC-html40/loose.dtd\">\n<html><body><p>bar</p></body></html>\n"
    

    Remove a node:

    > doc.at('p').remove
    #=> #<Nokogiri::XML::Element:0x80939178 name="p" children=[#<Nokogiri::XML::Text:0x8091a624 "bar">]>
    > doc.to_html
    #=> "<!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.0 Transitional//EN\" \"http://www.w3.org/TR/REC-html40/loose.dtd\">\n<html><body></body></html>\n"
    

    As for scraping, there are a lot of questions on SO about using Nokogiri for tearing apart HTML from sites. Searching StackOverflow for “nokogiri and open-uri” should help.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Just trying to understand that - I have never used it before. How is
Trying to understand what Sql Profiler means by emitting sp_reset_connection. I have the following,
im trying to understand the process of creating tables in ruby-on-rails 3. i have
trying to understand how custom admin commands work, I have my project named mailing
Trying to understand how to link a function that is defined in a struct,
I'm trying to use Ruby's Nokogiri to parse large (1 GB or more) XML
HI Trying to understand how __radd__ works. I have the code >>> class X(object):
Trying to understand free-identifier=? and bound-identifier=?. Can anyone give me equivalent code examples where
I trying to understand jquery, and can't figure a problem out. I have a
Im trying to understand capacity planning for a putative Cassandra network. I have brought

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.