I’m trying to get some images from a website source using python scrapy. The

Question

0

Editorial Team

Asked: June 18, 20262026-06-18T08:18:39+00:00 2026-06-18T08:18:39+00:00

I’m trying to get some images from a website source using python scrapy. The

0

I’m trying to get some images from a website source using python scrapy.

The whole thing works fine, except the process_item method in my pipeline which is not accessed.

Here are my files:

Settings.py:

BOT_NAME = 'dealspider'
SPIDER_MODULES = ['dealspider.spiders']
NEWSPIDER_MODULE = 'dealspider.spiders'

DEFAULT_ITEM_CLASS = 'dealspider.items.DealspiderItem'

ITEM_PIPELINES = ['scrapy.contrib.pipeline.images.ImagesPipeline', dealspider.ImgPipeline.MyImagesPipeline']

IMAGES_STORE = '/Users/Comp/Desktop/projects/ndailydeals/dimages/full'

ImgPipeline:

class MyImagesPipeline(ImagesPipeline):

    def get_media_requests(self, item, info):
        print "inside get_media_requests"
        for image_url in item['image_urls']:

            yield Request(image_url)

    def item_completed(self, results, item, info):

        image_paths = [x['path'] for ok, x in results if ok]
        if not image_paths:
            raise DropItem("Item contains no images")
        item['image_paths'] = image_paths
        print "inside item_completed"
        return item



    def process_item(self, item, spider):
        if spider.name == 'SgsnapDeal':
            print "inside process_item"
            # some code not relevant to the qn
            deal = DailyDeals(source_website_url=source_website_url, source_website_logo=source_website_logo, description=description, price=price, url=url, image_urls=image_urls, city=city, currency=currency)
            deal.save()

Not getting “inside process_item” on running the crawler. I have also tried adding process_item function in the scrapy.contrib.pipeline.images.py file, but that doesnt work too!

def process_item(self, item, info):
    print "inside process"
    pass

The problem: everything works, images are downloaded, image_paths are set etc, i know get_media_requests and item_completed works in MyImagesPipeline, because of some print statements, but not process_item!! Any help would be much appreciated..

EDIT:
Here are the other associated files:

spider:

from scrapy.spider import BaseSpider
from dealspider.items import DealspiderItem
from scrapy.selector import HtmlXPathSelector
from scrapy.contrib.pipeline.images import ImagesPipeline


class SG_snapDeal_Spider(BaseSpider):
    name = 'SgsnapDeal'
    allowed_domains = ['snapdeal.com']
    start_urls = [
        'http://www.snapdeal.com',
        ]

    def parse(self, response):
        item = DealspiderItem()

        hxs = HtmlXPathSelector(response)
        description = hxs.select('/html/body/div/div/div/div/div/div/div/div/div/a/div/div/text()').extract()  
        price = hxs.select('/html/body/div/div/div/div/div/div/div/div/div/a/div/div/div/span/text()').extract()
        url = hxs.select('/html/body/div/div/div/div/div/div/div/div/div/a/@href').extract()
        image_urls = hxs.select('/html/body/div/div/div/div/div/div/div/div/div/a/div/div/img/@src').extract()

        item['description'] = description
        item['price'] = price
        item['url'] = url
        item['image_urls'] = image_urls
        #works fine!!
        return item

SPIDER = SG_snapDeal_Spider()

Items.py:

from scrapy.item import Item, Field

class DealspiderItem(Item):
    description = Field()
    price = Field()
    url = Field()
    image_urls = Field()
    images = Field()
    image_paths = Field()

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-18T08:18:40+00:00

Editorial Team

2026-06-18T08:18:40+00:00Added an answer on June 18, 2026 at 8:18 am

You need to put process_item in separate pipeline which saves your item in database.
Not in the images pipeline.

make the separate pipeline like

class OtherPipeline(object):
  def process_item(self, item, info):
    print "inside process"
    pass

Include that pipleline in your settings file

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m trying to get some images from a website source using python scrapy. The

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply