I have the following function, which returns a filesize of a file over HTTP:
def GetFileSize(url):
" Function gets a url and returns it's filesize in bytes "
url = url.replace(' ', '%20')
u = urllib2.urlopen(url)
meta = u.info()
file_size = int(meta.getheaders("Content-Length")[0])
return file_size
I would like to get the biggest file from a given links, and I wrote the following function for it:
def GetBiggestFile(links):
" Function gets a list of links and returns the biggest file and his size in bytes "
dic = {}
for link in links:
filename = link.split('/')[-1]
filesize = GetFileSize(link)
dic[link] = filesize
print "%s | %.2f MB" % (filename, filesize / 1024.0 / 1024.0)
biggest_file = max(dic, key=dic.get)
return biggest_file, dic[biggest_file]
My lists have dozens of links, therefore this scripts takes some time to complete. Using threading I can fetch the different filesizes synchronously and shorten the running time of the code.
I’m not so sure how to do it – I’ve tried using a decorator that makes the function run asynchronously:
def run_async(func):
" Decorator for running functions asynchronously. "
from threading import Thread
from functools import wraps
@wraps(func)
def async_func(*args, **kwargs):
func_hl = Thread(target = func, args = args, kwargs = kwargs)
func_hl.start()
return func_hl
return async_func
But I’m not sure how to make my code wait for all the answers before trying to determine who is the biggest file.
Thanks.
You’ll be happier with multiprocessing.
Start with this example: http://docs.python.org/library/multiprocessing.html#using-a-pool-of-workers
Your
GetFileSizefunction can be run in a process pool.Since each process is separate, you should also have an “output Queue” into which the results are put. A separate process does a simple “get” to retrieve all the answers from the Queue.