Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 3494208
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 18, 20262026-05-18T11:58:42+00:00 2026-05-18T11:58:42+00:00

My test html file is here: http://pastebin.com/L88nYbQY As you can see there are some

  • 0

My test html file is here: http://pastebin.com/L88nYbQY

As you can see there are some unclosed input tags, and some self closing ones.

This causes the following code to return everything from the opening #qcbody div to the end of the file, ignoring the closing div tag.

require 'nokogiri'

f = File.open('t.html', 'r')
@doc = Nokogiri::XML(f)
@doc.at_css('#qcbody').to_html

I’m sure people have gotten around this problem in a variety of ways. How would you do it?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-18T11:58:42+00:00Added an answer on May 18, 2026 at 11:58 am

    Give this a try:

    require 'open-uri'
    require 'nokogiri'
    
    @doc = Nokogiri::HTML(File.open('t.html', 'r'))
    @doc.at_css('#qcbody').to_html
    

    In IRB:

    >> @doc.at_css('#qcbody').to_html
    => "<div id="qcbody">         \r\n    <form method="post" name="form" id="form" action="#">\r\n      <input type="hidden" name="Search Engine" id="Search Engine"><input type="hidden" name="Keyword" id="Keyword"><input type="button" onclick="javascript:validate()" name="sendsubmit" id="sendsubmit" class="submit">\n</form>\r\n    <div class="clear"></div>\r\n  </div>"
    

    The difference between using Nokogiri::XML and Nokogiri::HTML is the leniency when parsing the document. XML is required to validate and be correct. Some XML parsers would reject an XML file that doesn’t meet the standard. Nokogiri allows us to set how picky it is. (And in the case of XML, you can look at the errors array after parsing to see if there is a problem.)

    For HTML, Nokogiri relaxes the parser so there’s a better chance of handling real-world HTML. I’ve seen it handle some really ugly markup and keep on going when lesser parsers blew their lunch. If you look at Nokogiri::HTML.parse it has options = XML::ParseOptions::DEFAULT_HTML defined, which are the relaxed settings. You can override that if you want to make sure the HTML conforms.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Given the URL (single line): http://test.example.com/dir/subdir/file.html How can I extract the following parts using
I've uploaded a test file here: http://dl.dropbox.com/u/2201804/IE8test.html If you click on the Click me
Here is the test file: http://www.nicoeinsidler.bplaced.net/test/mail_test.html If there are no pictures everything works fine
I made an html file called test.html then I navigated to it as http://site.com/test.html?test1=a
here is the test page http://www.studioteknik.com/html/test-portfolio.html I got no error, but no hover-slide effect...
There are 2 html files, file-1.htm and file-2.htm. There is another html file, test.htm,
Where can I test HTML 5 functionality today - is there any test build
Is there a page similar to JSbin where I can test HTML and CSS
I'm trying to test an Android app following the instructions here: http://www.jetbrains.com/idea/webhelp/testing-android-applications.html and here
I've followed the instructions here: http://xach.livejournal.com/278047.html and gotten them to work. I called the

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.