I’m trying to run the following script:
#!python
from urllib import urlopen #urllib.request for python3
from lxml import html
url = 'http://mpk.lodz.pl/rozklady/1_11_D2D3/00d2/00d2t001.htm?r=KOZINY'+\
'%20-%20Srebrzy%F1ska,%20Cmentarna,%20Legion%F3w,%20pl.%20Wolno%B6ci'+\
',%20Pomorska,%20Kili%F1skiego,%20Przybyszewskiego%20-%20LODOWA'
raw_html = urlopen(url).read()
tree = html.fromstring(raw_html) #need to .decode('windows-1250') in python3
ret = tree.xpath('//td [@class!="naglczas"]')
print ret
assert(len(ret)==1)
I expect it to select the one td that doesn’t have its class set to ‘naglczas’. Instead, it returns me an empty list. Why is that? I guess there’s some silly reason, but I tried googling and found nothing that would explain it.
Your xpath expression will find
You seem to want(since the only 3 td-s with a class have the same class you don’t want)
Those might sound similar, but they are different.
Something like
should get you what you want.
Also, you don’t need to use urllib to open the url, lxml can do that for you, using
lxml.html.parse().