I have this code site = hxs.select(//h1[@class=’state’]) log.msg(str(site[0].extract()),level=log.ERROR) The ouput is [scrapy] ERROR: <h1

Question

0

Asked: June 14, 20262026-06-14T18:17:31+00:00 2026-06-14T18:17:31+00:00

I have this code site = hxs.select(//h1[@class=’state’]) log.msg(str(site[0].extract()),level=log.ERROR) The ouput is [scrapy] ERROR: <h1

0

I have this code

   site = hxs.select("//h1[@class='state']")
   log.msg(str(site[0].extract()),level=log.ERROR)

The ouput is

 [scrapy] ERROR: <h1 class="state"><strong>
            1</strong>
            <span> job containing <strong>php</strong> in <strong>region</strong> paying  <strong>$30-40k per year</strong></span>
                </h1>

Is it possible to only get the text without any html tags

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-14T18:17:32+00:00

//h1[@class='state']

in your above xpath you are selecting h1 tag that has class attribute state

so that’s why it’s selecting everything that comes in h1 element

if you just want to select text of h1 tag all you have to do is

//h1[@class='state']/text()

if you want to select text of h1 tag as well as its children tags, you have to use

//h1[@class='state']//text()

so the difference is /text() for specific tag text and //text() for text of specific tag as well as its children tags

below mentioned code works for you

site = ''.join(hxs.select("//h1[@class='state']/text()").extract()).strip()

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have this code site = hxs.select(//h1[@class=’state’]) log.msg(str(site[0].extract()),level=log.ERROR) The ouput is [scrapy] ERROR: <h1

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply