The website I am scraping has javascript that sets a cookie and checks it

Question

0

Asked: June 10, 20262026-06-10T04:19:50+00:00 2026-06-10T04:19:50+00:00

The website I am scraping has javascript that sets a cookie and checks it

0

The website I am scraping has javascript that sets a cookie and checks it in the backend to make sure js is enabled. Extracting the cookie from the html code is simple enough, but then setting it seems to be a problem in scrapy. So my code is:

from scrapy.contrib.spiders.init import InitSpider

class TestSpider(InitSpider):
    ...
    rules = (Rule(SgmlLinkExtractor(allow=('products/./index\.html', )), callback='parse_page'),)

    def init_request(self):
        return Request(url = self.init_url, callback=self.parse_js)

    def parse_js(self, response):
        match = re.search('setCookie\(\'(.+?)\',\s*?\'(.+?)\',', response.body, re.M)
        if match:
            cookie = match.group(1)
            value = match.group(2)
        else:
            raise BaseException("Did not find the cookie", response.body)
        return Request(url=self.test_page, callback=self.check_test_page, cookies={cookie:value})

    def check_test_page(self, response):
        if 'Welcome' in response.body:
            self.initialized()

    def parse_page(self, response):
        scraping....

I can see that the content is available in check_test_page, the cookie works perfectly. But it never even gets to parse_page since CrawlSpider without the right cookie doesn’t see any links. Is there a way to set a cookie for the duration of the scraping session? Or do I have to use BaseSpider and add the cookie to every request manually?

A less desirable alternative would be to set the cookie (the value seems to never change) through scrapy configuration files somehow. Is that possible?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-10T04:19:52+00:00

Editorial Team

2026-06-10T04:19:52+00:00Added an answer on June 10, 2026 at 4:19 am

It turned out that InitSpider is a BaseSpider. So it looks like 1) there’s no way to use CrawlSpider in this situation 2) there’s no way to set a sticky cookie

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

The website I am scraping has javascript that sets a cookie and checks it

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply