In order to test some security software, I need to be able to create a large (configurable) number of new processes (not threads!) in Windows, very quickly, have them exist for a (configurable) period of time, then terminate cleanly. The processes shouldn’t do anything at all – just exist for the specified duration.
Ultimately, I want to be able to run something like:
C:\> python process_generate.py --processes=150 --duration=2500
which would create 150 new processes very quickly, keep them all alive for 2500ms, then have them all terminate as quickly as possible.
As a starting point, I ran
from multiprocessing import Process
import os
def f():
pass
if __name__ == '__main__':
import datetime
count = 0
startime = datetime.datetime.now()
while True:
p = Process(target=f)
p.start()
p.terminate()
count += 1
if count % 1000 == 0:
now = datetime.datetime.now()
print "Started & stopped d processes in %s seconds" % (count, str(now-starttime))
and found I could create and terminate about 70 processes/second serially on my laptop, with the created processes terminating straightaway. The approx 70 processes/second rate was sustained over about an hour duration.
When I changed the code to
from multiprocessing import Process
import os
import time
def f_sleep():
time.sleep(1)
if __name__ == '__main__':
import datetime
starttime = datetime.datetime.now()
processes = []
PROCESS_COUNT = 100
for i in xrange(PROCESS_COUNT):
p = Process(target=f_sleep)
processes.append(p)
p.start()
for i in xrange(PROCESS_COUNT):
processes[i].terminate()
now = datetime.datetime.now()
print "Started/stopped %d processes in %s seconds" % (len(processes), str(now-starttime))
and tried different values for PROCESS_COUNT, I expected it to scale a lot better than it did. I got the following results for different values of PROCESS_COUNT:
- 20 processes completed in 0.72 seconds
- 30 processes completed in 1.45 seconds
- 50 processes completed in 3.68 seconds
- 100 processes completed in 14 seconds
- 200 processes completed in 43 seconds
- 300 processes completed in 77 seconds
- 400 processes completed in 111 seconds
This is not what I expected – I expected to be able to scale up the parallel process count in a reasonably linear fashion till I hit a bottleneck, but I seem to be hitting a process creation bottleneck almost straightaway. I definitely expected to be able to create something close to 70 processes/second before hitting a process creation bottleneck, based on the first code I ran.
Without going into the full specs, the laptop runs fully patched Windows XP, has 4Gb RAM, is otherwise idle and is reasonably new; I don’t think it’d be hitting a bottleneck this quickly.
Am I doing anything obviously wrong here with my code, or is XP/Python parallel process creation really that inefficient on a 12 month old laptop?
After profiling and testing a bunch of different scenarios, I found that it’s simply far faster to be generating and killing single processes under Windows, rather than generating N processes at once, killing all N, and restarting N again.
My conclusion is that Windows keeps enough resource available to be able to start 1 process at a time quite quickly, but not enough to start >1 new concurrent processes without considerable delay. As others have said, Windows is slow at starting new processes, but apparently the speed degrades semi-geometrically with the number of concurrent processes already running on the system – starting a single process is quite fast, but when you’re kicking off multiple processes you hit problems. This applies regardless of the number of CPUs that exist, how busy the machine is (typically <5% CPU in my testing), whether Windows is running on a physical server or virtual, how much RAM is available (I tested with up to 32Gb RAM, with ~24Gb free), … – it simply seems to be a limitation of the Windows OS. When I installed Linux on the same hardware, the limitation went away (as per Xavi’s response) and we were able to start many processes concurrently, very quickly.