Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8620435
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 12, 20262026-06-12T06:34:45+00:00 2026-06-12T06:34:45+00:00

I tried all I can do to encode the page then using BeautifulSoup. However,

  • 0

I tried all I can do to encode the page then using BeautifulSoup. However, when I run, it shows the unicode results. Can anyone help me how to encode under BeautifulSoup

my code:

import httplib
import urllib
import urllib2
from BeautifulSoup import BeautifulSoup
import HTMLParser


headers={
'Host': 'digitalvita.pitt.edu',
'Connection': 'keep-alive',
'Origin': 'https://digitalvita.pitt.edu',
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_4) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.89 Safari/537.1',
'Content-type': 'application/x-www-form-urlencoded; charset=UTF-8',
'Accept': 'text/javascript, text/html, application/xml, text/xml, */*',
'Referer': 'https://digitalvita.pitt.edu/index.php',
'Accept-Encoding': 'gzip,deflate,sdch',
'Accept-Language': 'en-US,en;q=0.8',
'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3',
'Cookie': 'PHPSESSID=lvetilatpgs9okgrntk1nvn595'
}

data={
'action':'search',
'xdata':'<search id="1"><context type="all" /><results><ordering>familyName</ordering><pagesize>100000</pagesize><page>1</page></results><terms><name>d</name><school>All</school></terms></search>',
'request':'search'
}

data = urllib.urlencode(data)
print data
req = urllib2.Request('https://digitalvita.pitt.edu/dispatcher.php', data, headers)
response = urllib2.urlopen(req)
the_page = response.read()

htmlCodes = [
    ['&', '&amp;'],
    ['<', '&lt;'],
    ['>', '&gt;'],
    ['"', '&quot;'],
]
htmlCodesReversed = htmlCodes[:]
htmlCodesReversed.reverse()

def htmlEncode(s, codes=htmlCodes):
    """ Returns the HTML encoded version of the given string. This is useful to
        display a plain ASCII text string on a web page."""
    for code in codes:
        s = s.replace(code[1], code[0])
    return s
s=htmlEncode(the_page,codes=htmlCodes)

h = HTMLParser.HTMLParser()
s=h.unescape(s)

s.encode("utf-8")

soup=BeautifulSoup(s,convertEntities=BeautifulSoup.HTML_ENTITIES)
print soup

The simple results is like:

 &lt;a href="#local" onclick="dvSearch.ToggleInterests(141432);"&gt;&lt;span class="iToggle" id="toggle_141432"&gt;more...&lt;/span&gt;&lt;/a&gt;&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;&lt;/div&gt;&lt;div&gt;&lt;table width="100%" cellspacing="5" cellpadding="0"&gt;&lt;tr valign="top"&gt;&lt;td&gt;&lt;img width="70" height="70" src="http://digitalvita.pitt.edu/digital-vitaUI.profileimages/na.jpg" /&gt;&lt;/td&gt;&lt;td width="99%"&gt;&lt;div&gt;&lt;span class="name"&gt; Znati, Taieb&lt;/span&gt;&lt;span class="email"&gt; (&lt;a href="mailto:znati@pitt.edu"&gt;znati@pitt.edu&lt;/a&gt;) &lt;/span&gt;&lt;/div&gt;&lt;div class="professionalPosition"&gt;Computer Science, University of Pittsburgh&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;&lt;/div&gt;&lt;div&gt;&lt;table width="100%" cellspacing="5" cellpadding="0"&gt;&lt;tr valign="top"&gt;&lt;td&gt;&lt;img width="70" height="70" src="http://digitalvita.pitt.edu/digital-vitaUI.profileimages/na.jpg" /&gt;&lt;/td&gt;&lt;td width="99%"&gt;&lt;div&gt;&lt;span class="name"&gt; Zoffer, H&lt;/span&gt;&lt;span class="email"&gt; (&lt;a href="mailto:zoffer@pitt.edu"&gt;zoffer@pitt.edu&lt;/a&gt;) &lt;/span&gt;&lt;/div&gt;&lt;div class="professionalPosition"&gt;"KGSB-Dean, Office of", University of Pittsburgh&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;&lt;/div&gt;&lt;div&gt;&lt;table width="100%" cellspacing="5" cellpadding="0"&gt;&lt;tr valign="top"&gt;&lt;td&gt;&lt;img width="70" height="70" src="http://digitalvita.pitt.edu/digital-vitaUI.profileimages/na.jpg" /&gt;&lt;/td&gt;&lt;td width="99%"&gt;&lt;div&gt;&lt;span class="name"&gt; Zorn, Kristin&lt;/span&gt;&lt;span class="email"&gt; (&lt;a href="mailto:kzorn@mail.magee.edu"&gt;kzorn@mail.magee.edu&lt;/a&gt;) &lt;/span&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;&lt;/div&gt;&lt;div&gt;&lt;table width="100%" cellspacing="5" cellpadding="0"&gt;&lt;tr valign="top"&gt;&lt;td&gt;&lt;img width="70" height="70" src="http://digitalvita.pitt.edu/digital-vitaUI.profileimages/na.jpg" /&gt;&lt;/td&gt;&lt;td width="99%"&gt;&lt;div&gt;&lt;span class="name"&gt; Zou, Chunbin&lt;/span&gt;&lt;span class="email"&gt; (&lt;a href="mailto:chz4@pitt.edu"&gt;chz4@pitt.edu&lt;/a&gt;) &lt;/span&gt;&lt;/div&gt;&lt;div class="researchInterest"&gt;&lt;b&gt;Research Interests: &lt;/b&gt;fatty liver disease; tyrosine kinase receptor; proteasome endopeptidase complex; phosphatidylcholines; trypanosome; Fas; ubiquitin; pulmonary surfactants; HGF/Met&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;&lt;/div&gt;&lt;div&gt;&lt;table width="100%" cellspacing="5" cellpadding="0"&gt;&lt;tr valign="top"&gt;&lt;td&gt;&lt;img width="70" height="70" src="http://digitalvita.pitt.edu/digital-vitaUI.profileimages/na.jpg" /&gt;&lt;/td&gt;&lt;td width="99%"&gt;&lt;div&gt;&lt;span class="name"&gt; Zou, Xiuying&lt;/span&gt;&lt;span class="email"&gt; (&lt;a href="mailto:xiz42@pitt.edu"&gt;xiz42@pitt.edu&lt;/a&gt;) &lt;/span&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;&lt;/div&gt;&lt;div&gt;&lt;table width="100%" cellspacing="5" cellpadding="0"&gt;&lt;tr valign="top"&gt;&lt;td&gt;&lt;img width="70" height="70" src="http://digitalvita.pitt.edu/digital-vitaUI.profileimages/na.jpg" /&gt;&lt;/td&gt;&lt;td width="99%"&gt;&lt;div&gt;&lt;span class="name"&gt; Zrust, Marilyn&lt;/span&gt;&lt;span class="email"&gt; (&lt;a href="mailto:zrustm@pitt.edu"&gt;zrustm@pitt.edu&lt;/a&gt;) &lt;/span&gt;&lt;/div&gt;&lt;div class="professionalPosition"&gt;Clinical Instructor, Acute/Tertiary Care, University of Pittsburgh School of Nursing&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;&lt;/div&gt;&lt;div&gt;&lt;table width="100%" cellspacing="5" cellpadding="0"&gt;&lt;tr valign="top"&gt;&lt;td&gt;&lt;img width="70" height="70" src="http://digitalvita.pitt.edu/digital-vitaUI.profileimages/na.jpg" /&gt;&lt;/td&gt;&lt;td width="99%"&gt;&lt;div&gt;&lt;span class="name"&gt; Zubieta, Juan&lt;/span&gt;&lt;span class="email"&gt; (&lt;a href="mailto:zubietajc@upmc.edu"&gt;zubietajc@upmc.edu&lt;/a&gt;) &lt;/span&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;&lt;/div&gt;&lt;div&gt;&lt;table width="100%" cellspacing="5" cellpadding="0"&gt;&lt;tr valign="top"&gt;&lt;td&gt;&lt;img width="70" height="70" src="http://digitalvita.pitt.edu/digital-vitaUI.profileimages/na.jpg" /&gt;&lt;/td&gt;&lt;td width="99%"&gt;&lt;div&gt;&lt;span class="name"&gt; Zuccoli, Giulio&lt;/span&gt;&lt;span class="email"&gt; (&lt;a href="mailto:giz3@pitt.edu"&gt;giz3@pitt.edu&lt;/a&gt;) &lt;/span&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;&lt;/div&gt;&lt;div&gt;&lt;table width="100%" cellspacing="5" cellpadding="0"&gt;&lt;tr valign="top"&gt;&lt;td&gt;&lt;img width="70" height="70" src="http://digitalvita.pitt.edu/digital-vitaUI.profileimages/na.jpg" /&gt;&lt;/td&gt;&lt;td width="99%"&gt;&lt;div&gt;&lt;span class="name"&gt; Zuckerman, Daniel&lt;/span&gt;&lt;span class="email"&gt; (&lt;a href="mailto:ddmmzz@pitt.edu"&gt;ddmmzz@pitt.edu&lt;/a&gt;) &lt;/span&gt;&lt;/div&gt;&lt;div class="professionalPosition"&gt;Computational Biology, University of Pittsburgh&lt;/div&gt;&lt;div class="researchInterest"&gt;&lt;b&gt;Research Interests: &lt;/b&gt;structural biology; stochastic processes; computer simulation; coarse-grained models; protein dynamics and fluctuations; models, theoretical&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;&lt;/div&gt;&lt;div&gt;&lt;table width="100%" cellspacing="5" cellpadding="0"&gt;&lt;tr valign="top"&gt;&lt;td&gt;&lt;img width="70" height="70" src="http://digitalvita.pitt.edu/digital-vitaUI.profileimages/na.jpg" /&gt;&lt;/td&gt;&lt;td width="99%"&gt;&lt;div&gt;&lt;span class="name"&gt; Zuckoff, Allan&lt;/span&gt;&lt;span class="email"&gt; (&lt;a href="mailto:zuckoffa@pitt.edu"&gt;zuckoffa@pitt.edu&lt;/a&gt;) &lt;/span&gt;&lt;/div&gt;&lt;div class="professionalPosition"&gt;Psychology, University of Pittsburgh&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;&lt;/div&gt;&lt;div&gt;&lt;table width="100%" cellspacing="5" cellpadding="0"&gt;&lt;tr valign="top"&gt;&lt;td&gt;&lt;img width="70" height="70" src="http://digitalvita.pitt.edu/digital-vitaUI.profileimages/na.jpg" /&gt;&lt;/td&gt;&lt;td width="99%"&gt;&lt;div&gt;&lt;span class="name"&gt; Zuckoff, Allan&lt;/span&gt;&lt;span class="email"&gt; (&lt;a href="mailto:ZuckoffAM@UPMC.EDU"&gt;ZuckoffAM@UPMC.EDU&lt;/a&gt;) &lt;/span&gt;&lt;/div&gt;&lt;div class="professionalPosition"&gt;Psychiatry, University of Pittsburgh&lt;/div&gt;&lt;div class="researchInterest"&gt;&lt;b&gt;Research Interests: &lt;/b&gt;psychotherapy; substance-related disorders; motivational interviewing; grief treatment ; diagnosis, dual (psychiatry); treatment adherence; patient compliance; traumatic grief and substance abuse&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;&lt;/div&gt;&lt;div&gt;&lt;table width="100%" cellspacing="5" cellpadding="0"&gt;&lt;tr valign="top"&gt;&lt;td&gt;&lt;img width="70" height="70" src="http://digitalvita.pitt.edu/digital-vitaUI.profileimages/na.jpg" /&gt;&lt;/td&gt;&lt;td width="99%"&gt;&lt;div&gt;&lt;span class="name"&gt; Zukor, Tevya&lt;/span&gt;&lt;span class="email"&gt; (&lt;a href="mailto:tez5@pitt.edu"&gt;tez5@pitt.edu&lt;/a&gt;) &lt;/span&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;&lt;/div&gt;&lt;div&gt;&lt;table width="100%" cellspacing="5" cellpadding="0"&gt;&lt;tr valign="top"&gt;&lt;td&gt;&lt;img width="70" height="70" src="http://digitalvita.pitt.edu/digital-vitaUI.profileimages/na.jpg" /&gt;&lt;/td&gt;&lt;td width="99%"&gt;&lt;div&gt;&lt;span class="name"&gt; Zuley, Margarita&lt;/span&gt;&lt;span class="email"&gt; (&lt;a href="mailto:zuleyml@upmc.edu"&gt;zuleyml@upmc.edu&lt;/a&gt;) &lt;/span&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;&lt;/div&gt;&lt;div&gt;&lt;table width="100%" cellspacing="5" cellpadding="0"&gt;&lt;tr valign="top"&gt;&lt;td&gt;&lt;img width="70" height="70" src="http://digitalvita.pitt.edu/digital-vitaUI.profileimages/na.jpg" /&gt;&lt;/td&gt;&lt;td width="99%"&gt;&lt;div&gt;&lt;span class="name"&gt; Zunino, Paolo&lt;/span&gt;&lt;span class="email"&gt; (&lt;a href="mailto:paz13@pitt.edu"&gt;paz13@pitt.edu&lt;/a&gt;) &lt;/span&gt;&lt;/div&gt;&lt;div class="professionalPosition"&gt;Mech Eng and Materials Sci, University of Pittsburgh&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;&lt;/div&gt;&lt;div&gt;&lt;table width="100%" cellspacing="5" cellpadding="0"&gt;&lt;tr valign="top"&gt;&lt;td&gt;&lt;img width="70" height="70" src="http://digitalvita.pitt.edu/digital-vitaUI.profileimages/na.jpg" /&gt;&lt;/td&gt;&lt;td width="99%"&gt;&lt;div&gt;&lt;span class="name"&gt; Zureikat, Amer&lt;/span&gt;&lt;span class="email"&gt; (&lt;a href="mailto:zureikatah@upmc.edu"&gt;zureikatah@upmc.edu&lt;/a&gt;) &lt;/span&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;&lt;/div&gt;&lt;div&gt;&lt;table width="100%" cellspacing="5" cellpadding="0"&gt;&lt;tr valign="top"&gt;&lt;td&gt;&lt;img width="70" height="70" src="http://digitalvita.pitt.edu/digital-vitaUI.profileimages/na.jpg" /&gt;&lt;/td&gt;&lt;td width="99%"&gt;&lt;div&gt;&lt;span class="name"&gt; Zutter, Chad&lt;/span&gt;&lt;span class="email"&gt; (&lt;a href="mailto:czutter@pitt.edu"&gt;czutter@pitt.edu&lt;/a&gt;) &lt;/span&gt;&lt;/div&gt;&lt;div class="professionalPosition"&gt;KGSB-Business Admin, University of Pittsburgh&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;&lt;/div&gt;&lt;div&gt;&lt;table width="100%" cellspacing="5" cellpadding="0"&gt;&lt;tr valign="top"&gt;&lt;td&gt;&lt;img width="70" height="70" src="http://digitalvita.pitt.edu/digital-vitaUI.profileimages/na.jpg" /&gt;&lt;/td&gt;&lt;td width="99%"&gt;&lt;div&gt;&lt;span class="name"&gt; Zyczynski, Halina&lt;/span&gt;&lt;span class="email"&gt; (&lt;a href="mailto:hzyczynski@mail.magee.edu"&gt;hzyczynski@mail.magee.edu&lt;/a&gt;) &lt;/span&gt;&lt;/div&gt;&lt;div class="professionalPosition"&gt;Obstetrics, Gynecology and Reproductive Sciences, University of Pittsburgh&lt;/div&gt;&lt;div class="researchInterest"&gt;&lt;b&gt;Research Interests: &lt;/b&gt;pelvic floor reconstruction; rectocele; uterine prolapse; sacralcolpopexy; bladder diseases; colpocleisis; pelic organ prolapse; urinary incontinence&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;&lt;/div&gt;

]]>

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-12T06:34:46+00:00Added an answer on June 12, 2026 at 6:34 am

    It looks like the problem is that you’re mixing up character sets.

    The first thing I’d do is change your Accept-Charset so you only accept utf-8.

    'Accept-Charset': 'utf-8;q=0.7,*;q=0.3',
    

    Next, the result of response.read() is an 8-bit string, which you have to decode. Since we now know that it’s utf-8, you can do this:

    the_page = response.read().decode('utf-8')
    

    With those two changes, when I run your script, the same fragment comes back as:

     … Self Care&lt;/span&gt;
                                                &lt;a href="#local" onclick="dvSearch.ToggleInterests(…
    

    No more garbage Unicode characters.

    Of course this only works because the server is willing to return utf-8. For a more general case, where you have some servers that can only do utf-8 and others that can only do Latin-1, you need to do something a bit more complicated. Leave the Accept-Charset header alone, and then change the read to look at the response headers. Something like this:

    response = urllib2.urlopen(req)
    charset = response.info().getencoding()
    the_page = response.read().decode(charset)
    

    There are many badly-configured servers that won’t actually return a charset, even when they aren’t returning pure 7-bit ASCII. In that case, you need to either examine what the server returns and hardcode the right answer, or write code to try to detect the proper charset on the fly. Hopefully you’ll never run into this situation…

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Please check my website out: http://www.budgie.richardcmpage.com/index.php I've tried all sorts and can't get the
I tried to recursively get all files and folder list.But I can only get
How can I replace all line-endings in big file (>100MB)? I have tried to
How can I delete all existing git aliases at once? I tried git config
I have tried all the solutions that have been provided including using PRAGMA but
I have tried all of these various ways to set the value of the
I've tried all scaletypes, but all of them result in the image to be
I've tried all the time calculating examples I found on this site but somehow
I have 2 path objects in my android code.I have tried all the way
I have a problem with random number generating in C# I have tried all

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.