I have to write a litte daemon that can check multiple (could be up to several hundred) email accounts for new messages.
My thoughts so far:
I could just create a new thread for each connection, using imapclient for retrieving the messages every x seconds, or use IMAP IDLE where possible. I also could modify imapclient a bit and select() over all the sockets where IMAP IDLE is activated using a single thread only.
Are there any better approaches for solving this task?
If only you’d asked a few months from now, because Python 3.3.1 will probably have a spiffy new async API. See http://code.google.com/p/tulip/ for the current prototype, but you probably don’t want to use it yet.
If you’re on Windows, you may be able to handle a few hundred threads without a problem. If so, it’s probably the simplest solution. So, try it and see.
If you’re on Unix, you probably want to use
pollinstead ofselect, becauseselectscales badly when you get into the hundreds of connections. (epollon linux orkqueueon Mac/BSD are even more scalable, but it doesn’t usually matter until you get into the thousands of connections.)But there are a few things you might want to consider before doing this yourself:
Twistedis definitely the hardest of these to get into—but it also comes with an IMAP client ready to go, among hundreds of other things, so if you’re willing to deal with a bit of a learning curve, you may be done a lot faster.Tornadofeels the most like writing nativeselect-type code. I don’t actually know all of the features it comes with; it may have an IMAP client, but if not, you’ll be hacking upimapclientthe same way you were considering withselect.Monoclesits on top of eitherTwistedorTornado, and lets you write code that’s kind of like what’s coming in 3.3.1, on top of Twisted or Tornado (although actually, you can do the same thing directly in Twisted withinlineCallbacks, it’s just that the docs disccourage you from learning that without learning everything else first). Again, you’d be hacking upimapclienthere. (Or usingTwisted‘s IMAP client instead… but at that point, you might as well useTwisteddirectly.)geventlets you write code that’s almost the same as threaded (or synchronous) code and just magically makes it asynchronous. You may need to hack upimapclienta bit, but it may be as simple as running the magic monkeypatching utility, and that’s it. And beyond that, you write the same code you’d write with threading, except that you create a bunch of greenlets instead of a bunch of threads, and you get an order of magnitude or two better scalability.If you’re looking for the absolute maximum scalability, you’ll probably want to parallelize and multiplex at the same time (e.g., run 8 processes, each using
gevent, on Unix, or attach a native threadpool to IOCP on Windows), but for a few hundred connections this shouldn’t be necessary.