I am using Python’s multiprocessing to do bulk downloads using FTP. However, when I try to open more than 5 ftp sessions, an EOFError is raised, meaning the host is disconnecting me for opening too many sessions.
The only solution I see is to open a single FTP object and pass it to the necessary methods. The problem is that because multiprocessing uses pickling to move objects around, and FTP objects can’t be pickled, this is not possible. My question is thus whether it is possible to work around this by finding a way to pickle FTP objects?
My code is of the following form:
def get_file(name):
#code here
def worker(name_list, out_q):
lst = []
for name in name_list:
lst.append(get_file(name))
out_q.put(lst)
if __name__ == '__main__':
#est ftp cnxn
ftp = FTP('ftp.blah.blah', 'anonymous', 'meow')
#multiprocessing code here
The get_file def needs access to the ftp connection, and if I put it outside of the if __name__ == '__main__' block, then a new ftp connection is created each time a process runs through the code.
I don’t really understand why you would want to do that:
How exactly does this solve your problem?
But, instead of serializing the FTP object, create a process for FTP requests and devise a mini-language for communicating with that process – let your other processes send (easily pickleable) messsages of the form
get src dst.EDIT: Just checked the documentation for
[ftplib][1]. Nowhere does it say it can handle multiple calls. Assume it doesn’t!So, I would do this:
MAX_CONNECTIONS(e.g. 5) FTP worker processes that