I have used lxml on Google App Engine to scrape some basic data.
It works fine with the SDK. When I try to use it on the appengine servers I get.
IOError: Error reading file 'http://www.google.com': failed to load external entity "http://www.google.com"
My code looks like;
import lxml.html
url = "http://www.google.com"
t = lxml.html.parse(url)
pagetitle = t.find.(".//title").text
self.response.out.write(pagetitle)
edit:
I ended up having to make a small change to handle as is outlined in the answer below.
from google.appengine.api import urlfetch
result = urlfetch.fetch(url)
t = lxml.html.fromstring(result.content)
GAE does not support opening sockets, you should use
urlfetch.fetch()to get the page contents, then feed it to the parser.