I want to extract an element if the previous elements text() matches specific criteria.

Question

0

Asked: June 9, 20262026-06-09T05:22:01+00:00 2026-06-09T05:22:01+00:00

I want to extract an element if the previous elements text() matches specific criteria.

0

I want to extract an element if the previous elements text() matches specific criteria. for example,

<html>
<div>
<table class="layouttab">
    <tbody>
    <tr>
        <td scope="row" class="srb">General information:&nbsp;&nbsp;</td>
        <td>(xxx) yyy-zzzz</td>
    </tr>
    <tr>
        <td scope="row" class="srb">Website:&nbsp;&nbsp;</td>
        <td><a href="http://xyz.edu" target="_blank">http://www.xyz.edu</a>
        </td>
    </tr>
    <tr>
        <td scope="row" class="srb">Type:&nbsp;&nbsp;</td>
        <td>4-year, Private for-profit</td>
    </tr>
    <tr>
        <td scope="row" class="srb">Awards offered:&nbsp;&nbsp;</td>
        <td>Less than one year certificate<br>One but less than two years certificate<br>Associate's degree<br>Bachelor's
            degree
        </td>
    </tr>
    <tr>
        <td scope="row" class="srb">Campus setting:&nbsp;&nbsp;</td>
        <td>City: Small</td>
    </tr>
    <tr>
        <td scope="row" class="srb">Related Institutions:</td>
        <td><a href="?q=xyz">xyz-New York</a>
            (Parent):
            <ul>
                <li style="list-style:circle">Berkeley College - Westchester Campus</li>
            </ul>
        </td>
    </tr>
    </tbody>
</table>
</div>
</html>

Now, I want to extract the URL if the previous element has “Website: ” in text() properties.
I am using python 2.x with scrapy 0.14. I was able to extract data using individual element such as

 item['Header_Type']= site.select('div/table[@class="layouttab"]/tr[3]/td[2]/text()').extract()

But this approach fails if the website parameter is missing and the tr[3] shift upward and i get ‘Type’ in website element and ‘Awards offered’ in Type.

Is there a specific command in xPath like,

'div/table[@class="layouttab"]/tr/td[2] {if td[1] has text = "Website"}

Thanks in advance.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-09T05:22:03+00:00

Editorial Team

2026-06-09T05:22:03+00:00Added an answer on June 9, 2026 at 5:22 am

For python and scrapy you should use following to select “Type” field,
worked great for me.

item['Header_Type']= site.select('div[1]/table[@class="layouttab"]/tr/td[contains(text(),"Type")]/following-sibling::td[1]/text()').extract()

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I want to extract an element if the previous elements text() matches specific criteria.

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply