I have a 3-step producer/consumer setup.
Client creates JSON-encoded dictionaries and sends them to PipeServer via a named pipe
Here are my threading.Thread subclasses:
PipeServer creates a named pipe and places messages into a queue unprocessed messages
Processor gets items from unprocessed messages, processes them (via a lambda function argument), and puts them into a queue processed messages
Printers gets items from processed messages, acquires a lock, prints the message, and releases the lock.
In the test script, I have one PipeServer, one Processor, and 4 Printers:
pipe_name = '\\\\.\\pipe\\testpipe'
pipe_server = pipetools.PipeServer(pipe_name, unprocessed_messages)
json_loader = lambda x: json.loads(x.decode('utf-8'))
processor = threadedtools.Processor(unprocessed_messages,
processed_messages,
json_loader)
print_servers = []
for i in range(4):
print_servers.append(threadedtools.Printer(processed_messages,
output_lock,
'PRINTER {0}'.format(i)))
pipe_server.start()
processor.start()
for print_server in print_servers:
print_server.start()
Question: in this kind of multi-step setup, how do I think through optimizing the number of Printer vs. Processor threads I should have? For example, how do I know if 4 is the optimal number of Printer threads to have? Should I have more processors?
I read through the Python Profilers docs, but didn’t see anything that would help me think through these kinds of tradeoffs.
Generally speaking, you want to optimize for the maximum throughput of your slowest component. In this case, it sounds like either Client or Printer. If it’s the Client, you want just enough Printers and Processors to be able to keep up with new messages (maybe that’s just one!). Otherwise you’ll be wasting resources on threads you don’t need.
If it’s Printers, then you need to optimize for the IO that’s occurring. A few variables to take into account:
If you can only have one lock, then you should only have one thread, so on and so forth.
You then want to test with real world operation (it’s difficult to predict what combination of RAM, disk and network activity will slow you down). Instrument your code so you can see how many threads are idle at any given time. Then create a test case that processes data into the system at maximum throughput. Start with an arbitrary number of threads for each component. If Client, Processor, or Printer threads are always busy, add more threads. If some threads are always idle, take some away.
You may need to retune if you move the code to a different hardware environment – different number of processors, more memory, different disk can all have an effect.