So, I have a script I’m working on to back up large a server(‘s directory) of files to a number of FTP accounts/services/whatever (at the moment the poor secretary has a copy-and-paste document to do this, but anyways I’m close to having a working script to save her from that =D).
I haven’t really messed around with threading or multiprocessing before, but I can’t figure out how to get it to take the list of files and upload ’em all to the host 3-5 at a time (in this example, I’m trying 5, but I dunno what I’ll decide on).
import os, sys, subprocess, shutil, re, string, glob, tvdb_api, itertools, multiprocessing, ftplib
files = [os.path.join(r, f) for r, d, fs in os.walk(os.getcwd()) for f in fs if not f[0]=='.']
class FTP_Upload:
def __init__(self, p=os.getcwd()):
self.files_to_upload = sorted([f for f in files if os.path.split(f)[0] == p])
self.target = raw_input("Enter the host you want to upload to: ")
self.host = FTP('ftp.host1.com', 'user_name1', 'super_secret_password1') if self.target == 'host' else FTP('ftp.host2.com', 'user_name2', 'secret_password2') if self.target == 'host2' else None
def upload_files(self, f):
self.host.storbinary(('STOR /'+f.split('/')[-1]), open(f, 'rb'))
def multiupload(self):
p = multiprocessing.Pool(processes=5)
p.map(self.upload_files(f), self.files_to_upload)
FTP_Upload().multiupload()
But this just uploads the last file in self.files_to_upload…
I tried just making the file list an iterable
self.files_to_upload = iter(sorted([f for f in files if os.path.split(f)[0] == p]))
But no joy.
Thanks in advance for any help!
If I understand you correctly, this sort of thing can be done quite easily with
multiprocessing. just write a function to upload one file —e.g.
and then use mulitprocessing on a list of files
You can also play around with the chunksize which will speed things up a little bit if the uploads are quick.
Of course, if you need to pass more information than just the filename, one really easy way to accomplish that would be to make your list of files a list of tuples and unpack them in the function.
WARNING
Some might consider this bad practice since you’re essentially using a map function for side-effects…
EDIT
I think your problem is
p.map(self.upload_files(f), self.files_to_upload)I’m not familiar with the
FTPin python, so I can’t say for sure, but you want to pass a function as the first parameter top.map. You’re passing the output of the function — It’s possible that you wrote a function which returns a function, but it doesn’t look like it from the code above.What you probably want is:
In general, a call to a
mapfunction can be translated to a list comprehension as follows:is almost equivalent to
(almost equivalent because in python3.x
mapreturns a generator. Notice that inmapyou don’t actually call the function.Final Edit (hopefully)
You’re running into an (unfortunate) limitation of
multiprocessing. All the objects that you send around must be pickleable. Apparently your instance method (a method bound to an instance of a class) is not pickleable. One solution is that you can to change it to being a regular function. You can do that as follows.Hopefully that will work out for you. Good Luck!