Suppose I have an HTML table with the following rows, … <tr> <th title=Library

Question

0

Asked: June 13, 20262026-06-13T05:55:48+00:00 2026-06-13T05:55:48+00:00

Suppose I have an HTML table with the following rows, … <tr> <th title=Library

0

Suppose I have an HTML table with the following rows,

...
<tr>
  <th title="Library of Quintessential Memes">LQM:</th>
  <td>
    <a href="docs/lqm.html"><b>Intro</b></a>
    <a href="P/P79/">79</a>
    <a href="P/P80/">80</a>
    <a href="P/P81/">81</a>
    <a href="P/P82/">82</a>
  </td>
</tr>
<tr>
  <th title="Library of Boring Books">LBB:</th>
  <td>
    <a href="docs/lbb.html"><b>Intro</b></a>
    <a href="R/R80/">80</a>
    <a href="R/R81/">81</a>
    <a href="R/R82/">82</a>
    <a href="R/R83/">83</a>
    <a href="R/R84/">84</a>
  </td>
</tr>
...

I would like to select all <a> elements in a <td> element whose associated <th>‘s text is in a small set of fixed titles (e.g. LQM, LBR, and RTT). How can I formulate this as an XPath query?

EDIT: I am using Scrapy, a Python scraping toolkit, so if it is easier to phrase this query as a set of smaller queries, I would be more than happy to use that. For example, if I could select all <tr> elements whose first <th> child matches a regex, then select all <a> descendants of the remaining <tr> elements, that would be splendid.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-13T05:55:50+00:00

The following XPath will work:

//a[contains(',LQM:,LBR:,RTT:,',
             concat(',', ancestor::td/preceding-sibling::th, ','))]

This can theoretically get some false positives (if your codes contained commas).

A stricter way to say it would be:

//a[ancestor::td/preceding-sibling::th[.='LQM:']]
|//a[ancestor::td/preceding-sibling::th[.='LBR:']]
|//a[ancestor::td/preceding-sibling::th[.='RTT:']]

I tested this by adding a <table> tag around your input and applying the following XSL transform:

<xsl:transform version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

    <xsl:template match="/">
        <xsl:for-each select="//a[ancestor::td/preceding-sibling::th[.='LQM:']]
                                  |//a[ancestor::td/preceding-sibling::th[.='LBR:']]
                                  |//a[ancestor::td/preceding-sibling::th[.='RTT:']]">
            <xsl:text>
</xsl:text>
            <xsl:copy-of select="."/>
        </xsl:for-each>
    </xsl:template>

</xsl:transform>

It produces the following output:

<a href="docs/lqm.html"><b>Intro</b></a>
<a href="P/P79/">79</a>
<a href="P/P80/">80</a>
<a href="P/P81/">81</a>
<a href="P/P82/">82</a>

Of course, if you are using XSL, then you might find this construction more readable:

<xsl:for-each select="//a">
    <xsl:variable name="header" select="ancestor::td/preceding-sibling::th"/>

    <xsl:if test="$header='LQM:' or $header = 'LBR:' or $header = 'RTT:'">
        <xsl:text>
        </xsl:text>
        <xsl:copy-of select="."/>

    </xsl:if>
</xsl:for-each>

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Suppose I have an HTML table with the following rows, … <tr> <th title=Library

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply