Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 9241109
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 18, 20262026-06-18T08:18:39+00:00 2026-06-18T08:18:39+00:00

I’m trying to get some images from a website source using python scrapy. The

  • 0

I’m trying to get some images from a website source using python scrapy.

The whole thing works fine, except the process_item method in my pipeline which is not accessed.

Here are my files:

Settings.py:

BOT_NAME = 'dealspider'
SPIDER_MODULES = ['dealspider.spiders']
NEWSPIDER_MODULE = 'dealspider.spiders'

DEFAULT_ITEM_CLASS = 'dealspider.items.DealspiderItem'

ITEM_PIPELINES = ['scrapy.contrib.pipeline.images.ImagesPipeline', dealspider.ImgPipeline.MyImagesPipeline']

IMAGES_STORE = '/Users/Comp/Desktop/projects/ndailydeals/dimages/full'

ImgPipeline:

class MyImagesPipeline(ImagesPipeline):

    def get_media_requests(self, item, info):
        print "inside get_media_requests"
        for image_url in item['image_urls']:

            yield Request(image_url)

    def item_completed(self, results, item, info):

        image_paths = [x['path'] for ok, x in results if ok]
        if not image_paths:
            raise DropItem("Item contains no images")
        item['image_paths'] = image_paths
        print "inside item_completed"
        return item



    def process_item(self, item, spider):
        if spider.name == 'SgsnapDeal':
            print "inside process_item"
            # some code not relevant to the qn
            deal = DailyDeals(source_website_url=source_website_url, source_website_logo=source_website_logo, description=description, price=price, url=url, image_urls=image_urls, city=city, currency=currency)
            deal.save()

Not getting “inside process_item” on running the crawler. I have also tried adding process_item function in the scrapy.contrib.pipeline.images.py file, but that doesnt work too!

def process_item(self, item, info):
    print "inside process"
    pass

The problem: everything works, images are downloaded, image_paths are set etc, i know get_media_requests and item_completed works in MyImagesPipeline, because of some print statements, but not process_item!! Any help would be much appreciated..

EDIT:
Here are the other associated files:

spider:

from scrapy.spider import BaseSpider
from dealspider.items import DealspiderItem
from scrapy.selector import HtmlXPathSelector
from scrapy.contrib.pipeline.images import ImagesPipeline


class SG_snapDeal_Spider(BaseSpider):
    name = 'SgsnapDeal'
    allowed_domains = ['snapdeal.com']
    start_urls = [
        'http://www.snapdeal.com',
        ]

    def parse(self, response):
        item = DealspiderItem()

        hxs = HtmlXPathSelector(response)
        description = hxs.select('/html/body/div/div/div/div/div/div/div/div/div/a/div/div/text()').extract()  
        price = hxs.select('/html/body/div/div/div/div/div/div/div/div/div/a/div/div/div/span/text()').extract()
        url = hxs.select('/html/body/div/div/div/div/div/div/div/div/div/a/@href').extract()
        image_urls = hxs.select('/html/body/div/div/div/div/div/div/div/div/div/a/div/div/img/@src').extract()

        item['description'] = description
        item['price'] = price
        item['url'] = url
        item['image_urls'] = image_urls
        #works fine!!
        return item

SPIDER = SG_snapDeal_Spider()

Items.py:

from scrapy.item import Item, Field

class DealspiderItem(Item):
    description = Field()
    price = Field()
    url = Field()
    image_urls = Field()
    images = Field()
    image_paths = Field()
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-18T08:18:40+00:00Added an answer on June 18, 2026 at 8:18 am

    You need to put process_item in separate pipeline which saves your item in database.
    Not in the images pipeline.

    make the separate pipeline like

    class OtherPipeline(object):
      def process_item(self, item, info):
        print "inside process"
        pass
    

    Include that pipleline in your settings file

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I am trying to find ID3V2 tags from MP3 file using jid3lib in Java.
I'm trying to convert HTML to plain text. I get many &\#8217; &\#8220; etc.
I am using jsonparser to parse data and images obtained from json response. When
I am using JSon response to parse title,date content and thumbnail images and place
I'm trying to decode HTML entries from here NYTimes.com and I cannot figure out
I'm new to using the Perl treebuilder module for HTML parsing and can't figure
Basically, what I'm trying to create is a page of div tags, each has
I am trying to understand how to use SyndicationItem to display feed which is
link Im having trouble converting the html entites into html characters, (&# 8217;) i
That's pretty much it. I'm using Nokogiri to scrape a web page what has

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.