I am using this scrapy code snippet to render javascript code of the website

Question

0

Asked: June 16, 20262026-06-16T19:24:37+00:00 2026-06-16T19:24:37+00:00

I am using this scrapy code snippet to render javascript code of the website

0

I am using this scrapy code snippet to render javascript code of the website that I want to crawl data from. The site is a video search engine and the search results is rendered by javascript. I want to follow the next page link and scrap the whole searched items. Following is my spider code:

class VideoSpider(BaseSpider):
    name = "VideoSpider"
    allowed_domains = ["domain.com"]
    start_urls = ['video search results link']

    def parse(self, response):
        hxs = HtmlXPathSelector(response)
        video_items = hxs.select("//ul[@id='results-list']/li[@class='result']")
        #items = []
        for vi in video_items:
            item = VideoItem()
            link = vi.select("a[@class='result-link']/@href").extract()[0]
            title = vi.select("a[@class='result-link']/@title").extract()[0]
            #print title,link
            item['title'] = title
            item['url'] = link
            yield item

        next_page = hxs.select("//div[@id='page']/a")
        for np in next_page:
            next_url = np.select("@href").extract()
            if next_url:
                url = urlparse.urljoin(response.url, next_url[0])
                #url = response.url, str(next_page)
                self.log("find next page url: %s"%url, log.INFO)
                yield Request(url, callback=self.parse)

I found that the link in the start_urls is correctly downloaded and rendered properly like this:

<ul id="results-list" class="clearfix" static="bl=normal">
    <li class="result" href="" </li>
     <li class="result" href="" </li>
     <li class="result" href="" </li>
    ....

Therefore the extracting is successful on the first page while when the next page links is fetched the javascript is not rendered like this:

<ul id="results-list" class="clearfix" static="bl=normal"></ul>
    <div id="loading">trying to load page for you, please be patient</div>

So the scraping stopped because it can not extract the links as a result of the results-list is not rendered.Why the first page is rendered properly but the second is not? Should I use selenium instead of webkit and jswebkit?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-16T19:24:39+00:00

Editorial Team

2026-06-16T19:24:39+00:00Added an answer on June 16, 2026 at 7:24 pm

Finally I figure out the problem. Some url are not properly formed.

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am using this scrapy code snippet to render javascript code of the website

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply