I have this code
site = hxs.select("//h1[@class='state']")
log.msg(str(site[0].extract()),level=log.ERROR)
The ouput is
[scrapy] ERROR: <h1 class="state"><strong>
1</strong>
<span> job containing <strong>php</strong> in <strong>region</strong> paying <strong>$30-40k per year</strong></span>
</h1>
Is it possible to only get the text without any html tags
in your above xpath you are selecting
h1tag that hasclassattributestateso that’s why it’s selecting everything that comes in
h1 elementif you just want to select text of
h1tag all you have to do isif you want to select text of
h1tag as well as its children tags, you have to useso the difference is
/text()for specific tag text and//text()for text of specific tag as well as its children tagsbelow mentioned code works for you