Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6680399
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 26, 20262026-05-26T04:28:14+00:00 2026-05-26T04:28:14+00:00

Following an advice in the answer: subclassing beautifulsoup html parser, getting type error ,

  • 0

Following an advice in the answer: subclassing beautifulsoup html parser, getting type error, I’m trying to use class composition instead of subclassing BeautifulSoup.

The basic Scraper class works fine on it’s own (at least to my limited testing).

The Scraper class:

from BeautifulSoup import BeautifulSoup
import urllib2

class Scrape():
    """base class to be subclassed
    basically a  wrapper that providers basic url fetching with urllib2
    and the basic html parsing with beautifulsoupץ
    some useful methods are provided with class composition with BeautifulSoup.
    for direct access to the soup class you can use the _soup property."""

    def __init__(self,file):
        self._file = file
        #very basic input validation
        #import re

        #import urllib2
        #from BeautifulSoup import BeautifulSoup
        try:
            self._page = urllib2.urlopen(self._file) #fetching the page
        except (urllib2.URLError):
            print ('please enter a valid url starting with http/https/ftp/file')

        self._soup = BeautifulSoup(self._page) #calling the html parser

        #BeautifulSoup.__init__(self,self._page)
        # the next part is the class compostion part - we transform attribute and method calls to the BeautifulSoup class
        #search functions:
        self.find = self._soup.find
        self.findAll = self._soup.findAll

        self.__iter__ = self._soup.__iter__ #enables iterating,looping in the object

        self.__len__ = self._soup.__len__
        self.__contains__ = self._soup.__contains__
        #attribute fetching and setting - __getattr__ implented by the scraper class
        self.__setattr__ = self._soup.__setattr__
        self.__getattribute__ = self._soup.__getattribute__

        #Called to implement evaluation of self[key]
        self.__getitem__ = self._soup.__getitem__
        self.__setitem__ = self._soup.__setitem__
        self.__delitem__ = self._soup.__delitem__

        self.__call__ = self._soup.__call__#Called when the instance is “called” as a function

        self._getAttrMap = self._soup._getAttrMap
        self.has_key = self._soup.has_key

        #walking the html document methods
        self.contents = self._soup.contents
        self.text = self._soup.text
        self.extract = self._soup.extract
        self.next = self._soup.next
        self.parent = self._soup.parent
        self.fetch = self._soup.fetch
        self.fetchText = self._soup.fetchText
        self.findAllNext = self._soup.findAllNext
        self.findChild = self._soup.findChild
        self.findChildren = self._soup.findChildren
        self.findNext = self._soup.findNext
        self.findNextSibling = self._soup.findNextSibling
        self.first = self._soup.first
        self.name = self._soup.name
        self.get = self._soup.get
        self.getString = self._soup.getString


        # comparison operators or similiar boolean checks
        self.__eq__ = self._soup.__eq__
        self.__ne__ = self._soup.__ne__
        self.__hash__ = self._soup.__hash__
        self.__nonezero__ = self._soup.__nonzero__ #not sure



        # the class represntation magic methods:
        self.__str__ = self._soup.__str__
        self.__repr__ =self._soup.__repr__
        #self.__dict__ = self._soup.__dict__


    def __getattr__(self,method):
        """basically this 'magic' method transforms calls for unknown attributes to
        and enables to traverse the html document with the .notation.
        for example - using instancename.div will return the first div.
        explantion: python calls __getattr__ if It didn't find any method or attribute correspanding to the call.
        I'm not sure this is a good or the right use for the method """

        return self._soup.find(method)

    def clean(self,work=False,element=False):
        """clean method that provides:basic cleaning of head,scripts etc
        input 'work' soup object to clean from unneccesary parts:scripts,head,style
        has optional variable:'element' that can get a tuple of element
        that enables to override what element to clean"""
        self._work = work or self._soup
        self._cleanelements=element or ("head","style","script")

        #for elem in self._work.findAll(self._cleanelements):
        for elem in self.findAll(self._cleanelements):
            elem.extract()

But when I subclass it I get some sort of recursion loop, I just can figure.

Here is the subclass (the relevant parts):

class MainTraffic(Scrape):
    """class traffic - subclasses the Scrape class
    inputs a page url and a category"""

    def __init__(self, file, cat, caller = False):
        if not caller:
            self._file = file
            #import urllib2
            #self._request = urllib2.Request(self._file)# request to post the show all questions
            Scrape.__init__(self,self._file)
            self.pagecat = cat
            self.clean(self)
            self.cleansoup = self.cleantotable(self)
            self.fetchlinks(self.cleansoup)
            #self.populatequestiondic()
            #del (self.cleansoup)

    def cleantotable(self):
        pass

    def fetchlinks(self,fetch):
        pass

    def length(self):
        from sqlalchemy import func
        self.len = session.query(func.count(Question.id)).scalar()
        return int(self.len)

    def __len__(self):
        return self.length()

    def __repr__(self):
        self.repr = "traffic theory question, current number of questions:{0}".format(self.length())
        return self.repr

    def  __getitem__(self,key):
        try:
            self._item = session.query(Question).filter_by(question_num=key).first()
            return self._item
        except (IndexError, KeyError):
            print "no such key:{0}".format(key)

and here is the error message:

File "C:\Python27\learn\traffic.py", line 117, in __init__
    Scrape.__init__(self,self._file)
  File "C:\Python27\learn\traffic.py", line 26, in __init__
    self._soup = BeautifulSoup(self._page) #calling the html parser
  File "C:\Python27\learn\traffic.py", line 92, in __getattr__
    return self._soup.find(method)
  File "C:\Python27\learn\traffic.py", line 92, in __getattr__
    return self._soup.find(method)
  File "C:\Python27\learn\traffic.py", line 92, in __getattr__
    return self._soup.find(method)
RuntimeError: maximum recursion depth exceeded

I suspect the problem is with me misusing the __getattr__, but I couldn’t figure out what should I change.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-26T04:28:15+00:00Added an answer on May 26, 2026 at 4:28 am

    Part 1

    Your code doesn’t work because __getattr__() accesses self._soup before it has been initialized. This happens due to four innocuous-looking lines:

    try:
      self._page = urllib2.urlopen(self._file)
    except (urllib2.URLError):
      print ('please enter a valid url starting with http/https/ftp/file') 
    

    Why do you catch the exception and not actually handle it?

    The next line accesses self._page, which has not been set yet if urlopen() threw an exception:

    self._soup = BeautifulSoup(self._page)
    

    Since it hasn’t been set, accessing it calls __getattr__(), which accesses self._soup, which has not been set yet so it accesses __getattr__.

    The easiest “fix” is to special-case _soup to prevent infinite recursion. Additionally, it seems to make more sense for __getattr__ to simply do normal attribute lookup on soup:

    def __getattr__(self,attr):
      if attr == "_soup":
        raise AttributeError()
      return getattr(self._soup,attr)
    

    Part 2

    Copying all the methods over is unlikely to work very well, and seems to miss the point of class composition entirely.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I need advice on the following HTML: <!-- Beginning of ROW !--> <div id="row1">
Following the advice of wcoenen, I've decided to try using registration-free COM. This works
Following the advice in this link, Eclipse, classpath and subversion , I have setup
After following the advice in this question successfully, I added a couple additional lines
After following the great advice given in a thread about service beans I have
I'd need advice on following situation with Oracle/PostgreSQL: I have a db table with
Experts - I need some advice in the following scenario. I have a configuration
I need some expect advice on how to handle the following:- I have a
I really need advice on how to do the following. I have tried several
Im starting with XNA and i need an advice about the following. I have

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.