I’m trying to find some nodes in an element tree, but this seems to work differently depending on which implementation I use for parsing. That doesn’t seem to be consistent with the documentation. Am I missing something?
In [52]: ElementTree.fromstring('<html><x /></html>').find('.//x')
Out[52]: <Element 'x' at 0x3008c10>
but:
In [59]: type(html5lib.parse('<html><x /></html>', treebuilder='lxml').find('.//x'))
Out[59]: <type 'NoneType'>
I’ve tried also html5lib with ElementTree, but that doesn’t even seem to run parsing that would comply with documentation:
In [72]: parser = html5lib.HTMLParser(tree=html5lib.treebuilders.getTreeBuilder('etree', cElementTree))
In [73]: type(parser.parse('<html><x /></html>'))
Out[73]: <type 'NoneType'>
So how do I solve this? I can’t continue using ElementTree directly since it doesn’t parse some broken html.
xpath()seems to work:It’s rather strange, however, that
html5libassignes XHTML namespace to plain HTML.