Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 9184009
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 17, 20262026-06-17T18:56:29+00:00 2026-06-17T18:56:29+00:00

I need to scrape some websites and some images from these sites. When the

  • 0

I need to scrape some websites and some images from these sites. When the image is a *.jpg i don’t have any problem, but these sites have *.svg images too, and i need these.

has anyone did this before?

here is the shell output with the error:

2013-01-18 14:44:10-0600 [crawler] DEBUG: Image (downloaded): Downloaded image from <GET http://page/image.svg> referred in <None>
2013-01-18 14:44:10-0600 [crawler] Unhandled Error

Traceback (most recent call last):
File "/virtualenvs/asd/local/lib/python2.7/site-packages/twisted/internet/defer.py", line 576, in _runCallbacks
        current.result = callback(current.result, *args, **kw)
File "/virtualenvs/asd/local/lib/python2.7/site-packages/twisted/internet/defer.py", line 381, in callback
        self._startRunCallbacks(result)
File "/virtualenvs/asd/local/lib/python2.7/site-packages/twisted/internet/defer.py", line 489, in _startRunCallbacks
        self._runCallbacks()
File "/virtualenvs/asd/local/lib/python2.7/site-packages/twisted/internet/defer.py", line 576, in _runCallbacks
        current.result = callback(current.result, *args, **kw)
    --- <exception caught here> ---
File "/virtualenvs/asd/local/lib/python2.7/site-packages/Scrapy-0.16.3-py2.7.egg/scrapy/contrib/pipeline/images.py", line 199, in media_downloaded
        checksum = self.image_downloaded(response, request, info)
File "/virtualenvs/asd/local/lib/python2.7/site-packages/Scrapy-0.16.3-py2.7.egg/scrapy/contrib/pipeline/images.py", line 252, in image_downloaded
        for key, image, buf in self.get_images(response, request, info):
File "/virtualenvs/asd/local/lib/python2.7/site-packages/Scrapy-0.16.3-py2.7.egg/scrapy/contrib/pipeline/images.py", line 261, in get_images
        orig_image = Image.open(StringIO(response.body))
File "/virtualenvs/asd/local/lib/python2.7/site-packages/PIL/Image.py", line 1980, in open
        raise IOError("cannot identify image file")
    exceptions.IOError: cannot identify image file

Thanks !
(sorry for my english)

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-17T18:56:30+00:00Added an answer on June 17, 2026 at 6:56 pm

    If it helps someone else, I could solve this as follow

    It item.py, add these attributes to the object:

          body = Field()
          url = Field()
    

    In the spider (inside def parse()), add this code:

    import urllib2 
    
    (...)
    
        #select each img url
        relative_urls = info.select('tr/td/a[@class="image"]/img/@src').extract()
    
        for relative_url in relative_urls:
            #static url
            relative_url = relative_url.split("svg")[0][2:-1]+".svg"
            relative_url = ''.join(relative_url.split("/thumb")).strip()
    
            relative_url = "http://"+relative_url
    
            asd = urllib2.urlopen(relative_url)
            data = asd.read()
            with open("%s/%s" % ('/home/user/virtualenvs', img.svg), "wb") as code:
                code.write(data)
    

    it works for me

    (obviously can separate the code between the spider and the pipeline)

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I need to scrape some websites, and would like to avoid downloading images from
I need to scrape some data from webpages. But I have some encoding problems
I need to scrape some data from a page that doesn't belong to my
I am using Prowser to screen scrape internal corporate websites. Some of the sites
I want to scrape some information from a pokerplatforms webpage. For that I need
I need a javascript function to automatically login and then scrape some detail from
I'm writing an application to crawl some websites and scrape data from them. I'm
I have a python function that scrapes some data from a few different websites
I need to scrape some website data from a table on a website and
I need to scrape Form 10-K reports (i.e. annual reports of US companies) from

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.