Colour space conversion (as suggested by Tony) will give you…

Question

0

Editorial Team

Asked: May 14, 20262026-05-14T06:44:28+00:00 2026-05-14T06:44:28+00:00

I want to parse HTML with lxml using XPath expressions. My problem is matching

0

I want to parse HTML with lxml using XPath expressions. My problem is matching for the contents of a tag:

For example given the

<a href="http://something">Example</a>

element I can match the href attribute using

.//a[@href='http://something']

but the given the expression

.//a[.='Example']

or even

.//a[contains(.,'Example')]

lxml throws the ‘invalid node predicate’ exception.

What am I doing wrong?

EDIT:

Example code:

from lxml import etree
from cStringIO import StringIO

html = '<a href="http://something">Example</a>'
parser = etree.HTMLParser()
tree   = etree.parse(StringIO(html), parser)

print tree.find(".//a[text()='Example']").tag

Expected output is ‘a’. I get ‘SyntaxError: invalid node predicate’

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-14T06:44:28+00:00

I would try with:

.//a[text()='Example']

using xpath() method:

tree.xpath(".//a[text()='Example']")[0].tag

If case you would like to use iterfind(), findall(), find(), findtext(), keep in mind that advanced features like value comparison and functions are not available in ElementPath.

lxml.etree supports the simple path
syntax of the find, findall and
findtext methods on ElementTree and
Element, as known from the original
ElementTree library (ElementPath). As
an lxml specific extension, these
classes also provide an xpath() method
that supports expressions in the
complete XPath syntax, as well as
custom extension functions.

How to approach applying for a job at a company ...

What is a programmer’s life like?

How to handle personal stress caused by utterly incompetent and ...

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions