I want to parse HTML with lxml using XPath expressions. My problem is matching

Question

0

Editorial Team

Asked: May 14, 20262026-05-14T06:44:28+00:00 2026-05-14T06:44:28+00:00

I want to parse HTML with lxml using XPath expressions. My problem is matching

0

I want to parse HTML with lxml using XPath expressions. My problem is matching for the contents of a tag:

For example given the

<a href="http://something">Example</a>

element I can match the href attribute using

.//a[@href='http://something']

but the given the expression

.//a[.='Example']

or even

.//a[contains(.,'Example')]

lxml throws the ‘invalid node predicate’ exception.

What am I doing wrong?

EDIT:

Example code:

from lxml import etree
from cStringIO import StringIO

html = '<a href="http://something">Example</a>'
parser = etree.HTMLParser()
tree   = etree.parse(StringIO(html), parser)

print tree.find(".//a[text()='Example']").tag

Expected output is ‘a’. I get ‘SyntaxError: invalid node predicate’

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-14T06:44:28+00:00

I would try with:

.//a[text()='Example']

using xpath() method:

tree.xpath(".//a[text()='Example']")[0].tag

If case you would like to use iterfind(), findall(), find(), findtext(), keep in mind that advanced features like value comparison and functions are not available in ElementPath.

lxml.etree supports the simple path
syntax of the find, findall and
findtext methods on ElementTree and
Element, as known from the original
ElementTree library (ElementPath). As
an lxml specific extension, these
classes also provide an xpath() method
that supports expressions in the
complete XPath syntax, as well as
custom extension functions.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I want to parse HTML with lxml using XPath expressions. My problem is matching

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply