Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 9215851
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 18, 20262026-06-18T02:17:42+00:00 2026-06-18T02:17:42+00:00

I have a CrawlSpider, the code is below. I use Tor through tsocks. When

  • 0

I have a CrawlSpider, the code is below. I use Tor through tsocks.
When I start my spider, everything works fine. Using init_request I can login on site and crawl with sticky cookies.

But problem occurred when I stopped and resumed spider. Cookies became not sticky.

I give you the response from Scrapy.

=======================INIT_REQUEST================
2013-01-30 03:03:58+0300 [my] INFO: Spider opened
2013-01-30 03:03:58+0300 [my] INFO: Resuming crawl (675 requests scheduled)
............ And here crawling began

So… callback=self.login_url in def init_request is not fired!!!

I thought that scrapy engine didn’t want to send again request on login page. Before resuming scrapy I changed login_page (I can login from every page on site) to different that not included in restrict_xpaths.

Result is – After resuming I cannot login and previous cookies are lost.

Does anyone have some assumptions?

from scrapy.contrib.spiders import CrawlSpider,Rule
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
from scrapy.selector import HtmlXPathSelector
from scrapy.contrib.loader import XPathItemLoader
from scrapy.contrib.loader.processor import TakeFirst, MapCompose, Join, Identity
from beles_com_ua.items import Product
from scrapy.utils.response import get_base_url
from scrapy.utils.url import urljoin_rfc
from scrapy.utils.markup import remove_entities
from django.utils.html import strip_tags
from datetime import datetime
from scrapy import log
import re
from scrapy.http import Request, FormRequest

class ProductLoader(XPathItemLoader):
    .... some code is here ...


class MySpider(CrawlSpider):
    name = 'my'
    login_page = 'http://test.com/index.php?section=6&type=12'

    allowed_domains = ['test.com']
    start_urls = [
        'http://test.com/index.php?section=142',
    ]
    rules = (
        Rule(SgmlLinkExtractor(allow=('.',),restrict_xpaths=('...my xpath...')),callback='parse_item', follow=True),
    )
    def start_requests(self):
        return self.init_request()

    def init_request(self):
        print '=======================INIT_REQUEST================'
        return [Request(self.login_page, callback=self.login_url)]


    def login_url(self, response):
        print '=======================LOGIN======================='
        """Generate a login request."""
        return FormRequest.from_response(response,
            formdata={'login': 'mylogin', 'pswd': 'mypass'},
            callback=self.after_login)

    def after_login(self, response):
        print '=======================AFTER_LOGIN ...======================='
        if "images/info_enter.png" in response.body:
               print "==============Bad times :(==============="
        else:
           print "=========Successfully logged in.========="
           for url in self.start_urls:
            yield self.make_requests_from_url(url)

    def parse_item(self, response):
        hxs = HtmlXPathSelector(response)

        entry = hxs.select("//div[@class='price']/text()").extract()
        l = ProductLoader(Product(), hxs)
        if entry:
        name = hxs.select("//div[@class='header_box']/text()").extract()[0]
        l.add_value('name', name)
        ... some code is here ...
        return l.load_item()
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-18T02:17:43+00:00Added an answer on June 18, 2026 at 2:17 am

    The init_request(self): is available only when you subclass from InitSpider not CrawlSpider

    You need to subclass your spider from InitSpider like this

    class WorkingSpider(InitSpider):
    
        login_page = 'http://www.example.org/login.php'
        def init_request(self):
            #"""This function is called before crawling starts."""
            return Request(url=self.login_page, callback=self.login)
    

    But then remember that you can’t define Rules in initSpider as its only avaiable in CrawlSpider you need to manually extract the links

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

HI, I'm have this short spider code: class TestSpider(CrawlSpider): name = test allowed_domains =
Have a fun issue with sharepoint calendar view filtering. That code works fine: SPSecurity.RunWithElevatedPrivileges(delegate()
I have created a spider which inherits from CrawlSpider. I need to use the
I am using CrawlSpider to crawl and extract data from a webpage. The start
I have spider that I have written using the Scrapy framework. I am having
Have got a method which returns IEnumerable<User> which I have been using Linq /
have defined a Java Interface with the annotation @WebService Compiled the code which went
Have started working my way through a book called WebGL: Up and Running, which
Have created a ATL COM project through which I am inserting Menu Items to
Have have this line of code in my form when I create a new

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.