Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 9072939
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 16, 20262026-06-16T18:17:38+00:00 2026-06-16T18:17:38+00:00

I want to scrape the content of websites with Python. Just like this: Apple’s

  • 0

I want to scrape the content of websites with Python. Just like this:

Apple’s stock continued to dominate the news over the weekend, with Barron’s placing it on the top of its favorite 2013 stock list.

But print them with error result:

Apple âs stock continued to dominate the news over the weekend, with Barronâs placing it on the top of its favorite 2013 stock list.

The symbol “’” can’t be shown, here is my code:

    #-*- coding: utf-8 -*-

    import sys
    reload(sys)
    sys.setdefaultencoding('utf-8')
    import urllib
    from lxml import *
    import urllib
    import lxml.html as HTML

    url = "http://www.forbes.com/sites/panosmourdoukoutas/2012/12/09/apple-tops-barrons- 10-favorite-stocks-for-2013/?partner=yahootix"
    sock = urllib.urlopen(url)
    htmlSource = sock.read()
    sock.close()

    root = HTML.document_fromstring(htmlSource)
    contents = ' '.join([x.strip() for x in root.xpath("//div[@class='body']/descendant::text()")])

    print contents

    f = open('C:/Users/yinyao/Desktop/Python Code/data.txt','w')
    f.write(contents)
    f.close()

However, after setting, the function of printf is not useful. Why? And what should I do?
I’m using Windows, and the default encoding approach is gbk.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-16T18:17:40+00:00Added an answer on June 16, 2026 at 6:17 pm

    First, ensure that you know The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)

    Second, always use unicode internally. Decode early, encode late: when you scrap a website, decode it to unicode and process it as unicode internally in your script. Otherwise your code will crash at random points, for example because it encountered an unexpected character in a comment in some webpage in Chinese. Only when you pass it later somewhere (e.g., to some writeable stream) you should encode it (“utf-8” preferably)

    Third, use BeautifulSoup 4

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I'm writing a battle.net screen scraper in python, and I want to scrape this
I want to scrape this URL : https://www.xstreetsl.com/modules.php?searchSubmitImage_x=0&searchSubmitImage_y=0&SearchLocale=0&name=Marketplace&SearchKeyword=business&searchSubmitImage.x=0&searchSubmitImage.y=0&SearchLocale=0&SearchPriceMin=&SearchPriceMax=&SearchRatingMin=&SearchRatingMax=&sort=&dir=asc Go into each of the links
I want to use greasemonkey to scrape wiki data from Last.fm (this is not
I am using the below code to scrape over XFN content from web page
I am using the below code to scrape over XFN content from web page
I want to use the Python Scrapy module to scrape all the URLs from
I'm making a simple program to scrape content from several webpages. I want to
I want to scrape a page of data (using the Python Scrapy library) without
I want to scrape a list of facts from simple website. Each one of
I want to scrape text data from a windows application to do additional processing

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.