There’s probably a better way to do this than what I’d doing, because I’m

Question

0

Editorial Team

Asked: June 3, 20262026-06-03T14:56:01+00:00 2026-06-03T14:56:01+00:00

There’s probably a better way to do this than what I’d doing, because I’m

0

There’s probably a better way to do this than what I’d doing, because
I’m stuck in a metaphorical pothole.

I want to get some of the nodes beneath a particular node. I came up
with this XPath expression:

>>> content_tags = 'h1 h2 h3 h4 h5 h6 p ol ul dl table'.split() 
>>> content_xpath = './/*[%s]' % ' or '.join('self::%s' % i for i in content_tags) 
>>> content_xpath 
'.//*[self::h1 or self::h2 or self::h3 or self::h4 or self::h5 or 
self::h6 or self::p or self::ol or self::ul or self::dl or 
self::table]'

Any of the listed content_tags can be the top of the hierarchy I’d
wanting, and I want to ignore other elements that may be at the same
or higher levels. Unfortunately, sometimes there’s a <p> inside a
<ul> or a <table>, or a <table> inside a <ol>, etc., and I get the
inner element as a separate result along with the outer. Is there a good way to
perform a “cut” to ignore nodes that may be nested inside one that
I’ve found? Or is there some better way of doing this that I’m
somehow missing?

Here’s an example of what I’m trying to parse.

<div class="interesting"> 
<img src="ignore-this.jpg"/> 
<h1>I want this.</h1> 
<p>I want this, too.</p> 
<div class="sidebar"> 
<ul> 
<li><p>I only want one copy of this, inside the UL.</p></li> 
<li><p>Ditto.</p></li> 
</ul> 
</div> 
</div>

Thanks!

BTW, I found a few posts on a w3.org mailing list that advocated a
“dont-include- any-descendant-or-self” filter, which I think would do
exactly what I want, but it doesn’t seem to have made it into the
final spec. 🙁

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-03T14:56:02+00:00

Editorial Team

2026-06-03T14:56:02+00:00Added an answer on June 3, 2026 at 2:56 pm

Searching as with //p is explicitly recursive — if that’s not what you want, don’t do that! 🙂

If you only want a p that’s directly under an interesting div, but that div can be anywhere in your hierarchy, this would be expressed as such:

//div[@class='interesting']/p

…if you only want a p that’s directly under the location in your tree the search is relative to, that’s even simpler:

./p

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

There’s probably a better way to do this than what I’d doing, because I’m

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply