I am using BeautifulSoup in Python and am having trouble replacing some tags. I am finding <div> tags and checking for children. If those children do not have children (are a text node of NODE_TYPE = 3), I am copying them to be a <p>.
from BeautifulSoup import Tag, BeautifulSoup
class bar:
self.soup = BeautifulSoup(self.input)
foo()
def foo(self):
elements = soup.findAll(True)
for node in elements:
# ....other stuff here if not <div> tags.
if node.name.lower() == "div":
if not node.find('a'):
newTag = Tag(self.soup, "p")
newTag.setString(node.text)
node.replaceWith(newTag)
nodesToScore.append(newTag)
else:
for n in node.findAll(True):
if n.getString(): # False if has children
newTag = Tag(self.soup, "p")
newTag.setString(n.text)
n.replaceWith(newTag)
I’m getting an AttributeError:
File "file.py", line 125, in function
node.replaceWith(newTag)
File "BeautifulSoup.py", line 131, in replaceWith
myIndex = self.parent.index(self)
AttributeError: 'NoneType' object has no attribute 'index'
I do the same replacing on node higher up in the for loop and it works correctly. I’m assuming it’s having problems because of the additional iterating through node as n.
What am I doing wrong or what would be a better way to do this? Thanks!
PS. I’m using Python 2.5 for Google Appengine and BeautifulSoup 3.0.8.1
The error says:
This code occurs on line 131 of BeautifulSoup.py.
It says that
self.parentis None.Looking at the surrounding code shows that
selfshould equalnodein your code, sincenodeis calling itsreplaceWithmethod.(Note: The error message saysnode.replaceWith, but the code you posted showsn.replaceWith. The code you posted does not correspond to the error message/traceback.) So apparentlynode.parentis None.You could probably avoid the error by placing
at some point in the code before
node.replaceWithis called.Edit: I suggest you use
printstatements to investigate where in the HTML you are whennode.parentis None (i.e. where the error is occurring). Maybe useprint node.contentsorprint node.previous.contentsorprint node.next.contentsto see where you are. Once you see the HTML it might become obvious what pathological situation you are in which is causingnode.parentto beNone.