Consider the tag in my html is like this <div class =summary> <p>Best <a

Question

0

Editorial Team

Asked: May 26, 20262026-05-26T07:30:21+00:00 2026-05-26T07:30:21+00:00

Consider the tag in my html is like this <div class =summary> <p>Best <a

0

Consider the

tag in my html is like this

<div class ="summary">
    <p>Best <a class="abch" href="/canvas">canvas</a> abcdefgh <a class="zph" href="/canvas">canvas</a>, I cycle them to garden</p>
</div>

When I do

site.select('.//*[contains(@class, "summary")]/p/text()').extract()

I get only the text of p and the hyperlinks are lost.
I want to do extract the data of

as well as the textual data of (eg canvas above). There can be any number of tags inside the

element. they may or may not be present within the

tag.

Any idea how to extract the entire data.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-26T07:30:21+00:00

I think two slashes after p will work for you. One slash / selects children only, two slashes // will include deeper elements. Since text nodes under a are not direct children of p they are not selected.

site.select('.//*[contains(@class, "summary")]/p//text()').extract()

Update:

Answering to your comment: I can only can think of such way:

for p in site.select('.//*[contains(@class, "summary")]/p'):
    p.select('//text()').extract()

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Consider the tag in my html is like this <div class =summary> <p>Best <a

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply