I’m trying to find some nodes in an element tree, but this seems to

Question

0

Asked: May 30, 20262026-05-30T23:08:44+00:00 2026-05-30T23:08:44+00:00

I’m trying to find some nodes in an element tree, but this seems to

0

I’m trying to find some nodes in an element tree, but this seems to work differently depending on which implementation I use for parsing. That doesn’t seem to be consistent with the documentation. Am I missing something?

In [52]: ElementTree.fromstring('<html><x /></html>').find('.//x')
Out[52]: <Element 'x' at 0x3008c10>

but:

In [59]: type(html5lib.parse('<html><x /></html>', treebuilder='lxml').find('.//x'))
Out[59]: <type 'NoneType'>

I’ve tried also html5lib with ElementTree, but that doesn’t even seem to run parsing that would comply with documentation:

In [72]: parser = html5lib.HTMLParser(tree=html5lib.treebuilders.getTreeBuilder('etree', cElementTree))

In [73]: type(parser.parse('<html><x /></html>'))
Out[73]: <type 'NoneType'>

So how do I solve this? I can’t continue using ElementTree directly since it doesn’t parse some broken html.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-30T23:08:46+00:00

xpath() seems to work:

>>> doc = html5lib.parse('<!doctype html><html><x /></html>', treebuilder='lxml')

>>> doc.xpath('.//*')
    [<Element {http://www.w3.org/1999/xhtml}head at 0x102c04a50>,
 <Element {http://www.w3.org/1999/xhtml}body at 0x102c04aa0>,
 <Element {http://www.w3.org/1999/xhtml}x at 0x102c04af0>]

>>> doc.xpath('.//html:x', namespaces={'html':'http://www.w3.org/1999/xhtml'})
    [<Element {http://www.w3.org/1999/xhtml}x at 0x102c04af0>]

It’s rather strange, however, that html5lib assignes XHTML namespace to plain HTML.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m trying to find some nodes in an element tree, but this seems to

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply