Consider the
tag in my html is like this
<div class ="summary">
<p>Best <a class="abch" href="/canvas">canvas</a> abcdefgh <a class="zph" href="/canvas">canvas</a>, I cycle them to garden</p>
</div>
When I do
site.select('.//*[contains(@class, "summary")]/p/text()').extract()
I get only the text of p and the hyperlinks are lost.
I want to do extract the data of
as well as the textual data of (eg canvas above). There can be any number of tags inside the
element. they may or may not be present within the
tag.
Any idea how to extract the entire data.
I think two slashes after
pwill work for you. One slash/selects children only, two slashes//will include deeper elements. Since text nodes underaare not direct children ofpthey are not selected.Update:
Answering to your comment: I can only can think of such way: