Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7928315
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 3, 20262026-06-03T19:41:53+00:00 2026-06-03T19:41:53+00:00

I have this script: import urllib2 from BeautifulSoup import BeautifulSoup import html5lib import lxml

  • 0

I have this script:

import urllib2
from BeautifulSoup import BeautifulSoup
import html5lib
import lxml

soup = BeautifulSoup(urllib2.urlopen("http://www.hitmeister.de").read())

But this gives me the following error:

Traceback (most recent call last):
  File "akaConnection.py", line 59, in <module>
    soup = BeautifulSoup(urllib2.urlopen("http://www.hitmeister.de").read())
  File "/usr/lib/pymodules/python2.6/BeautifulSoup.py", line 1499, in __init__
    BeautifulStoneSoup.__init__(self, *args, **kwargs)
  File "/usr/lib/pymodules/python2.6/BeautifulSoup.py", line 1230, in __init__
    self._feed(isHTML=isHTML)
  File "/usr/lib/pymodules/python2.6/BeautifulSoup.py", line 1263, in _feed
    self.builder.feed(markup)
  File "/usr/lib/python2.6/HTMLParser.py", line 108, in feed
    self.goahead(0)
  File "/usr/lib/python2.6/HTMLParser.py", line 148, in goahead
    k = self.parse_starttag(i)
  File "/usr/lib/python2.6/HTMLParser.py", line 226, in parse_starttag
    endpos = self.check_for_whole_start_tag(i)
  File "/usr/lib/python2.6/HTMLParser.py", line 301, in check_for_whole_start_tag
    self.error("malformed start tag")
  File "/usr/lib/python2.6/HTMLParser.py", line 115, in error
    raise HTMLParseError(message, self.getpos())
HTMLParser.HTMLParseError: malformed start tag, at line 56, column 872

Then I tried this code:

soup = BeautifulSoup(urllib2.urlopen("http://www.hitmeister.de").read(),"lxml") 

or

soup = BeautifulSoup(urllib2.urlopen("http://www.hitmeister.de").read(),"html5lib")

This gives me this error:

Traceback (most recent call last):
  File "akaConnection.py", line 59, in <module>
    soup = BeautifulSoup(urllib2.urlopen("http://www.hitmeister.de").read(),"lxml")
  File "/usr/lib/pymodules/python2.6/BeautifulSoup.py", line 1499, in __init__
    BeautifulStoneSoup.__init__(self, *args, **kwargs)
  File "/usr/lib/pymodules/python2.6/BeautifulSoup.py", line 1230, in __init__
    self._feed(isHTML=isHTML)
  File "/usr/lib/pymodules/python2.6/BeautifulSoup.py", line 1263, in _feed
    self.builder.feed(markup)
  File "/usr/lib/python2.6/HTMLParser.py", line 108, in feed
    self.goahead(0)
  File "/usr/lib/python2.6/HTMLParser.py", line 156, in goahead
    k = self.parse_declaration(i)
  File "/usr/lib/pymodules/python2.6/BeautifulSoup.py", line 1112, in parse_declaration
    j = HTMLParser.parse_declaration(self, i)
  File "/usr/lib/python2.6/markupbase.py", line 109, in parse_declaration
    self.handle_decl(data)
  File "/usr/lib/pymodules/python2.6/BeautifulSoup.py", line 1097, in handle_decl
    self._toStringSubclass(data, Declaration)
  File "/usr/lib/pymodules/python2.6/BeautifulSoup.py", line 1030, in _toStringSubclass
    self.soup.endData(subclass)
  File "/usr/lib/pymodules/python2.6/BeautifulSoup.py", line 1318, in endData
    (not self.parseOnlyThese.text or \
AttributeError: 'str' object has no attribute 'text'

I am running Linux Ubuntu 10.04, Python 2.6.5, BeautifulSoup version is : ‘3.1.0.1’
How can I fix my code, or is there something what I missed?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-03T19:41:56+00:00Added an answer on June 3, 2026 at 7:41 pm

    As suggested in comments, please use pytidylib…

    import urllib2
    from StringIO import StringIO
    
    from BeautifulSoup import BeautifulSoup
    from tidylib import tidy_document
    
    html = urllib2.urlopen("http://www.hitmeister.de").read()
    tidy, errors = tidy_document(html)
    soup = BeautifulSoup(tidy)
    print type(soup)
    

    Running this…

    (py26_default)[mpenning@Bucksnort ~]$ python foo.py
    <class 'BeautifulSoup.BeautifulSoup'>
    (py26_default)[mpenning@Bucksnort ~]$
    

    The errors from pytidylib were:

    line 53 column 1493 - Warning: '<' + '/' + letter not allowed here
    line 53 column 1518 - Warning: '<' + '/' + letter not allowed here
    line 53 column 1541 - Warning: '<' + '/' + letter not allowed here
    line 53 column 1547 - Warning: '<' + '/' + letter not allowed here
    line 132 column 239 - Warning: '<' + '/' + letter not allowed here
    line 135 column 231 - Warning: '<' + '/' + letter not allowed here
    line 434 column 98 - Warning: replacing invalid character code 156
    line 453 column 96 - Warning: replacing invalid character code 156
    line 780 column 108 - Warning: replacing invalid character code 159
    line 991 column 27 - Warning: replacing invalid character code 156
    line 1018 column 43 - Warning: '<' + '/' + letter not allowed here
    line 1029 column 40 - Warning: '<' + '/' + letter not allowed here
    line 1037 column 126 - Warning: '<' + '/' + letter not allowed here
    line 1039 column 96 - Warning: '<' + '/' + letter not allowed here
    line 1040 column 71 - Warning: '<' + '/' + letter not allowed here
    line 1041 column 58 - Warning: '<' + '/' + letter not allowed here
    line 1047 column 126 - Warning: '<' + '/' + letter not allowed here
    line 1049 column 96 - Warning: '<' + '/' + letter not allowed here
    line 1050 column 72 - Warning: '<' + '/' + letter not allowed here
    line 1051 column 58 - Warning: '<' + '/' + letter not allowed here
    line 1063 column 108 - Warning: '<' + '/' + letter not allowed here
    line 1066 column 58 - Warning: '<' + '/' + letter not allowed here
    line 1076 column 17 - Warning: <input> element not empty or not closed
    line 1121 column 140 - Warning: '<' + '/' + letter not allowed here
    line 1202 column 33 - Error: <g:plusone> is not recognized!
    line 1202 column 33 - Warning: discarding unexpected <g:plusone>
    line 1202 column 88 - Warning: discarding unexpected </g:plusone>
    line 1245 column 86 - Warning: replacing invalid character code 130
    line 1265 column 33 - Warning: entity "&gt" doesn't end in ';'
    line 1345 column 354 - Warning: '<' + '/' + letter not allowed here
    line 1361 column 255 - Warning: unescaped & or unknown entity "&_s_icmp"
    line 1361 column 562 - Warning: unescaped & or unknown entity "&_s_icmp"
    line 1361 column 856 - Warning: unescaped & or unknown entity "&_s_icmp"
    line 1397 column 115 - Warning: replacing invalid character code 130
    line 1425 column 116 - Warning: replacing invalid character code 130
    line 1453 column 115 - Warning: replacing invalid character code 130
    line 1481 column 116 - Warning: replacing invalid character code 130
    line 1509 column 116 - Warning: replacing invalid character code 130
    line 1523 column 251 - Warning: replacing invalid character code 159
    line 1524 column 259 - Warning: replacing invalid character code 159
    line 1524 column 395 - Warning: replacing invalid character code 159
    line 1533 column 151 - Warning: replacing invalid character code 159
    line 1537 column 115 - Warning: replacing invalid character code 130
    line 1565 column 116 - Warning: replacing invalid character code 130
    line 1593 column 116 - Warning: replacing invalid character code 130
    line 1621 column 115 - Warning: replacing invalid character code 130
    line 1649 column 115 - Warning: replacing invalid character code 130
    line 1677 column 115 - Warning: replacing invalid character code 130
    line 1705 column 115 - Warning: replacing invalid character code 130
    line 1750 column 150 - Warning: replacing invalid character code 130
    line 1774 column 150 - Warning: replacing invalid character code 130
    line 1798 column 150 - Warning: replacing invalid character code 130
    line 1822 column 150 - Warning: replacing invalid character code 130
    line 1826 column 78 - Warning: replacing invalid character code 130
    line 1854 column 150 - Warning: replacing invalid character code 130
    line 1878 column 150 - Warning: replacing invalid character code 130
    line 1902 column 150 - Warning: replacing invalid character code 130
    line 1926 column 150 - Warning: replacing invalid character code 130
    line 1954 column 186 - Warning: unescaped & or unknown entity "&charge"
    line 2004 column 100 - Warning: replacing invalid character code 156
    line 2033 column 162 - Warning: replacing invalid character code 159
    line 21 column 1 - Warning: <meta> proprietary attribute "property"
    line 22 column 1 - Warning: <meta> proprietary attribute "property"
    line 23 column 1 - Warning: <meta> proprietary attribute "property"
    line 29 column 1 - Warning: <meta> proprietary attribute "property"
    line 30 column 1 - Warning: <meta> proprietary attribute "property"
    line 31 column 1 - Warning: <meta> proprietary attribute "property"
    line 412 column 9 - Warning: <body> proprietary attribute "itemscope"
    line 412 column 9 - Warning: <body> proprietary attribute "itemtype"
    line 1143 column 1 - Warning: <script> inserting "type" attribute
    line 1225 column 44 - Warning: <table> lacks "summary" attribute
    line 1934 column 9 - Warning: <div> proprietary attribute "name"
    line 436 column 41 - Warning: trimming empty <li>
    line 446 column 89 - Warning: trimming empty <li>
    line 1239 column 33 - Warning: trimming empty <span>
    line 1747 column 37 - Warning: trimming empty <span>
    line 1771 column 37 - Warning: trimming empty <span>
    line 1795 column 37 - Warning: trimming empty <span>
    line 1819 column 37 - Warning: trimming empty <span>
    line 1851 column 37 - Warning: trimming empty <span>
    line 1875 column 37 - Warning: trimming empty <span>
    line 1899 column 37 - Warning: trimming empty <span>
    line 1923 column 37 - Warning: trimming empty <span>
    line 2018 column 49 - Warning: trimming empty <span>
    line 2026 column 49 - Warning: trimming empty <span>
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have inherited the following Python script: import urllib2 a = urllib2.urlopen('http://mysite/mypage.aspx?action=dosomething') a.read() a.close()
I have some questions about the performance of this simple python script: import sys,
I have this script: select name,create_date,modify_date from sys.procedures order by modify_date desc I can
I have the following script which uses SocksiPY and Tor: from TorCtl import TorCtl
I have this script import unittest,itertools,random ##testclass class Testcomb(unittest.TestCase): def test_input(self): self.assertRaises(TypeError,calculate_combinations,dict(comb1), 5) def
I have this script which basically toggles a bgColor class on and off so
i have this script $content = string if(!isset($_GET['page'])){ $page = 1; } else{ $page
I have this script to generate an XML file for an RSS feed. Works
I have this script and when i try to run it, it just says
I have this script at the moment, which changes an image when a thumbnail

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.