I’m working on a simple experiment in Python. I have a “master” process, in charge of all the others, and every single process has a connection via unix socket to the master process. I would like to be able for the master process to be able to monitor all of the sockets for a response – but there could theoretically be almost a hundred of them. How would threads impact the memory and performance of the application? What would be the best solution? Thanks a lot!
Share
One hundred simultaneous threads might be pushing the reasonable limits of threading. If you find this is the cleanest way to organize your code, I’d say give it a try, but threading really doesn’t scale very far.
What works better is to use a technique like
selectto wait for one of the sockets to be readable / writable / or has an error to report. This mechanism lets you go to sleep until something interesting happens, handle as many sockets have content to handle, and then go back to sleep again, all in a single thread of execution. Removing the multi-threading can often reduce chances for errors, and this style of programming should get you into the hundreds of connections no trouble. (If you want to go beyond about 100, I’d use thepollfunctionality instead ofselect— constantly rebuilding the list of interesting file descriptors takes time thatpolldoes not require.)Something to consider is the Python Twisted Framework. They’ve gone to some length to provide a consistent way to hook callbacks onto events for this exact sort of programming. (If you’re familiar with
node.js, it’s a bit like that, but Python.) I must admit a slight aversion to Twisted — I never got very far in their documentation without being utterly baffled — but a lot of people made it further in the docs than I did. You might find it a better fit than I have.