I am using lxml’s xpath function to retrieve parts of a webpage. I am

Question

0

Asked: May 17, 20262026-05-17T23:58:10+00:00 2026-05-17T23:58:10+00:00

I am using lxml’s xpath function to retrieve parts of a webpage. I am

0

I am using lxml’s xpath function to retrieve parts of a webpage. I am trying to get contents of a <font> tag, which includes html tags of its own. If I use

//td[@valign="top"]/p[1]/font[@face="verdana" and @color="#ffffff" and @size="2"]

I get the right amount of nodes, but they are returned as lxml objects (<Element font at 0x101fe5eb0>).

If I use

//td[@valign="top"]/p[1]/font[@face="verdana" and @color="#ffffff" and @size="2"]/text()

I get exactly what I want, except that I don’t get any of the HTML code which is contained within the <font> nodes.

If I use

//td[@valign="top"]/p[1]/font[@face="verdana" and @color="#ffffff" and @size="2"]/node()

if get a mixture of text and lxml elements! (e.g. something something <Element a at 0x102ac2140> something)

Is there anyway to use a pure XPath query to get the contents of the <font> nodes, or even to force lxml to return a string of the contents from the .xpath() method, rather than an lxml object?

Note that I’m returning a list of many nodes from the XPath query so the solution needs to support that.

just to clarify… i want to return something something <a href="url">inside</a> something from something like…

<font face="verdana" color="#ffffff" size="2"><a href="url">inside</a> something</font>

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-17T23:58:10+00:00

I’m not sure I understand — is this close to what you are looking for?

import lxml.etree as le
import cStringIO
content='''\
<font face="verdana" color="#ffffff" size="2"><a href="url">inside</a> something</font>
'''
doc=le.parse(cStringIO.StringIO(content))

xpath='//font[@face="verdana" and @color="#ffffff" and @size="2"]/child::*'
x=doc.xpath(xpath)
print(map(le.tostring,x))
# ['<a href="url">inside</a> something']

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am using lxml’s xpath function to retrieve parts of a webpage. I am

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply