Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7040471
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 28, 20262026-05-28T01:55:20+00:00 2026-05-28T01:55:20+00:00

I use Nokogiri to parse an html. I need both the content and image

  • 0

I use Nokogiri to parse an html. I need both the content and image tags in the page, so I use inner_html instead of content method. But the value returned by content is encoded correct, while wrongly encoded by inner_html. One note, the page is in Chinese and not use UTF-8 encoding.

Here is my code:

# encoding: utf-8
require 'rubygems'
require 'nokogiri'
require 'open-uri'
require 'iconv'

doc = Nokogiri::HTML.parse(open("http://www.sfzt.org/advise/view.asp?id=536"), nil, 'gb18030')

doc.css('td.font_info').each do |link|
  # output, correct but not i expect: 目前市面上影响比
  puts link.content

  # output, wrong and not i expect: <img ....></img>Ŀǰ??????Ӱ??Ƚϴ?Ľ????
  # I expect: <img ....></img>目前市面上影响比
  puts link.inner_html
end
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-28T01:55:21+00:00Added an answer on May 28, 2026 at 1:55 am

    That is written on the ‘Encoding’ section on README: http://nokogiri.org/

    Strings are always stored as UTF-8 internally. Methods that return
    text values will always return UTF-8 encoded strings. Methods that
    return XML (like to_xml, to_html and inner_html) will return a string
    encoded like the source document.

    So, you should convert inner_html string manually if you want to get it as UTF-8 string:

    puts link.inner_html.encode('utf-8') # for 1.9.x
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Let's use Nokogiri as an example. How can I rewrite page = Nokogiri::HTML.parse(html) as
I'm trying to use Nokogiri to parse an HTML file with some fairly eccentric
I want to use nokogiri to loop through a html and create an object
How can I use Nokogiri with having html entities (like German umlauts) untouched? I.e.:
I'm trying to use Ruby's Nokogiri to parse large (1 GB or more) XML
I'm trying to use nokogiri to select a DOM element but it seems to
I have a table and want to use Nokogiri to extract the content of
I'm having trouble editing an XML file. I'm currently trying to use Nokogiri ,
use this website a lot but first time posting. My program creates a number
I'm trying to use domainatrix with nokogiri and am coming up with a holdup.

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.