I am having a problem with Scrapy, for some reason it is not entering my parse method, and i have no idea why could that be. I have tried different options without success.
This is how my code looks now. Specifically, there is are two print statements, and the one in the parse() method is not being called.
from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
from scrapy import log
from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
from comments.items import CustomerReview
import re
class AppidSpider(BaseSpider):
name = "appid"
allowed_domains = ["itunes.apple.com"]
start_urls = [
"http://itunes.apple.com/us/genre/ios/id36?mt=8"
]
rules = [Rule(SgmlLinkExtractor(), follow=True, callback='parse')]
print "---> THIS IS TEST 1"
def parse(self, response):
print " ----> THIS IS TEST 2"
# ... More code afterwards
And this is the output. As you can see TEST 2 is never printed.
$ scrapy crawl appid
2012-07-05 13:41:02+0000 [scrapy] INFO: Scrapy 0.14.4 started (bot: comments)
2012-07-05 13:41:02+0000 [scrapy] DEBUG: Enabled extensions: LogStats, TelnetConsole, CloseSpider, WebService, CoreStats, MemoryUsage, SpiderState
---> THIS IS TEST 1
2012-07-05 13:41:02+0000 [scrapy] DEBUG: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, RedirectMiddleware, CookiesMiddleware, HttpCompressionMiddleware, ChunkedTransferMiddleware, DownloaderStats
2012-07-05 13:41:02+0000 [scrapy] DEBUG: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware
2012-07-05 13:41:02+0000 [scrapy] DEBUG: Enabled item pipelines: FilterWordsPipeline
2012-07-05 13:41:02+0000 [appid] INFO: Spider opened
2012-07-05 13:41:02+0000 [appid] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2012-07-05 13:41:02+0000 [scrapy] DEBUG: Telnet console listening on 0.0.0.0:6023
2012-07-05 13:41:02+0000 [scrapy] DEBUG: Web service listening on 0.0.0.0:6080
2012-07-05 13:41:02+0000 [appid] DEBUG: Crawled (200) <GET http://itunes.apple.com/us/genre/ios/id36?mt=8> (referer: None)
2012-07-05 13:41:02+0000 [appid] INFO: Closing spider (finished)
2012-07-05 13:41:02+0000 [appid] INFO: Dumping spider stats:
{'downloader/request_bytes': 222,
'downloader/request_count': 1,
'downloader/request_method_count/GET': 1,
'downloader/response_bytes': 9927,
'downloader/response_count': 1,
'downloader/response_status_count/200': 1,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2012, 7, 5, 13, 41, 2, 694678),
'scheduler/memory_enqueued': 1,
'start_time': datetime.datetime(2012, 7, 5, 13, 41, 2, 604025)}
2012-07-05 13:41:02+0000 [appid] INFO: Spider closed (finished)
2012-07-05 13:41:02+0000 [scrapy] INFO: Dumping global stats:
{'memusage/max': 95318016, 'memusage/startup': 95318016}
Why do you pass parse as a string? Try
callback=self.parseinstead.