The Python code below connects to lots of servers, grabs some info from each one and returns the results. It currently kicks off a separate thread for each connection. I would like to see how performance is affected by using a separate process for each connection rather than a thread. Can this code be easily changed to use processes instead of threads? What exactly would I need to do? What are the risks, if any?
Python 2.6 / Platform Linux
class ServerInfoGetter(threading.Thread):
def __init__(self, host, port=DEFAULT_PORT, timeout=15):
self.host = host
self.timeout = timeout
self.port = port
self.result = None
threading.Thread.__init__(self)
def get_result(self):
return self.result
def run(self):
try:
serv_check = ServCheck(self.host, \
port=self.port, \
timeout=self.timeout)
serv_check.get_info()
self.result = serv_check
except Exception, err:
logging.debug("Could not run ServCheck for : %s %s",self.host, err)
def process_hosts(hosts_and_ports):
def producer(queue, hosts_and_ports):
for host, ports in hosts_and_ports.items():
for port in ports:
logging.info("processing host: %s:%s", host, port)
thread = ServerInfoGetter(str(host), port)
thread.start()
queue.put(thread, True) # True so block until slot available
results = []
def consumer(queue, total_checks):
while len(results) < total_checks:
thread = queue.get(True)
thread.join()
results.append(thread.get_result())
logging.info("processing hosts")
queue = Queue(QUEUE_SIZE)
prod_thread = threading.Thread(target=producer,
args=(queue,
hosts_and_ports))
cons_thread = threading.Thread(target=consumer,
args=(queue,
calculate_total_checks(hosts_and_ports)))
prod_thread.start()
cons_thread.start()
prod_thread.join()
cons_thread.join()
return results
As it says in the documentation:
So, basically, you just have to replace all
threading.Threadobjects withmultiprocessing.Processobjects (and similarly, the queue needs to be replaced with amultiprocessing.Queueobject).At least, that’s how it would appear. However, in practice, all objects that need to cross
Processboundaries need to bemultiprocessing.Valueobjects. Otherwise, they will never update across threads.This includes
self.host,self.timeout,self.port,self.resultif you’re only going to modify theServerInfoGetterclass. Read the rest of the multiprocessing doc to get an idea for the other data types that you’ll need to use.Also, as a sidenote, I’m not sure if it would be a problem for python 2.6 on Linux, but for python 2.7 on Windows, both idle and the interactive interpreter have trouble (for me, at least) with multiprocessing. These problems go away when directly executing the script with the python or pythonw executables. Update – python 2.5.1 on my Slackware box doesn’t have this problem, so you may be fine in interactive mode as well… although winwaed wasn’t, so who knows…?