I’m using python to write a crawler, since I need to parse html so I import lxml but it comes out an wierd error:
<type 'dict'>
{'xpath': '//ul[@id="i-detail"]/li[1]', 'name': u'\u6807\u9898'}
<type 'dict'>
{'xpath': '//ul[@id="i-detail"]/li[1]', 'name': u'\u6807\u9898'}
<type 'dict'>
{'xpath': '//ul[@id="i-detail"]/li[1]', 'name': u'\u6807\u9898'}
Exception in thread Thread-3:
Traceback (most recent call last):
File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/threading.py", line 522, in __bootstrap_inner
self.run()
File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/threading.py", line 477, in run
self.__target(*self.__args, **self.__kwargs)
File "fetcher.py", line 78, in run
self.extractContent(html)
File "fetcher.py", line 151, in extractContent
m = tree.xpath(c['xpath'])
AttributeError: 'NoneType' object has no attribute 'xpath'
<type 'dict'>
{'xpath': '//ul[@id="i-detail"]/li[1]', 'name': u'\u6807\u9898'}
Here’s a piece of my code:
for c in self.contents:
print type(c)
print c
m = tree.xpath(c['xpath'])
Please help me with these two questions:
-
Why the type is
dictbut the error says NoneType ? -
I’m tring to match something in the “tree”, but it doesn’t work (The website is encoded under GBK, could the encoding type cause this kind of problems ?).
You are getting an
AttributeError, which means thattreehas noxpathattribute as it has becomeNone, not thatchas noxpathkey, that’d be aKeyErrorinstead.Clearly we are missing some code here, where
treeis set to `None.You are not printing the result of your
tree.xpath()calls, so there is nothing in your code (as shared with us here) that printsm. Thetree.xpath()calls could be working fine for all we know.Reading between the lines and speculating a little, you are assigning the result of
tree.xpath()back totree, and your XPath expression didn’t match anything and returned None. The next time into the loop, you now haveNoneinstead of anElementTreeNode, so thexpath()call fails with anAttributeError.