I’m using urllib2, cstringIO and PIL. I need to really tune this and make it very fast (at least half the current speed)
I access and load the image using the below.
imageurl = "http://bit.ly/wOqVTE"
@log_performance
def get_image(imageurl):
img_file = urllib.urlopen(imageurl)
data = StringIO(img_file.read())
im = Image.open(data)
size = 128, 128
im.thumbnail(size, Image.ANTIALIAS)
return im
Then process the image using:
@log_performance
def process_image(image, sample_limit=10000, top=10):
colors = image.getcolors(sample_limit)
sc = sorted(colors, key=lambda x: x[0], reverse=True)
return sc[:top]
This takes on average 0.6 seconds to get the image and around 0.006 seconds to process.
How can I speed up the get and load process?
The full gist can be found here. https://gist.github.com/1920167
>>>>Function: get_image, Executed:20, Avg Time:0.558275926113
>>>>Function: process_image, Executed:20, Avg Time:0.00609920024872
I will add bounty of 50 for anyone that can half the time.
Since it’s getting the images that takes the longest time, why not use threading(or Gevent) to get those images concurrently, throw the results in a task queue, and process when they are ready.
And add cache for images with the same url…