I’m wondering what is the best way to implement high availability for a self-developed SMTP Relay Server that needs to handle 30-100 emails per second.
What this server essentially does is to authenticate with various smtp clients, and then relay to specific mail servers and handle errors, e.g. mail server not available, etc. Hence, a queue is required to contain the email messages, which can contain big attachments. For high availability, the system should support clustering, which I can use Windows Clustering for a primary/secondary active cluster.
I reckon that the Email Queue can reside in:
- Memory (this method is out for obvious reason.)
- RDMS
- File (I believe this is what IIS Virtual SMTP uses?)
- Embedded DB such as SQLite
- Mix of SQLite and File
- Some fanciful 3rd party Queue product that needs to be installed and configured?
The usual method for high availability is to use an RDBMS such as MySQL, but using an RDBMS for a message queue is going to slow down the performance significantly unless I have a powerful MySQL server. On top of that, I will have to implement MySQL clustering which is not easy. Also, I read somewhere that no one should use MySQL as a queue – http://www.engineyard.com/blog/2011/5-subtle-ways-youre-using-mysql-as-a-queue-and-why-itll-bite-you/
Alternatively, I can use SQLite+File, this will probably be the fastest (other than pure memory) and easiest to deploy method (nothing to install), but there’s no clustering for SQLite so if the server crashes, the unsent messages may still be lost.
2 and 4 is the same thing – RDBMS. It is poor choice for this since RDBMS is targeted at complex queries and have to maintain itself tiem to time. Collect garbage, rebuild indices, grow db files… Any time next INSERT would be slow or even outright blocked for a while. Even in version-based engines like Firebird and Postgress. More so with lock-based like MS SQL and SQLite.
If you wish it reliable, then probably you’d to implement it as several narrow-task workers, that would also give you benefit of multithreading using OmniThreaLibrary or AsyncCalls.
“Sink” worker should receive mail into memory buffer, perhaps extract some metadata form headers and store it to the current receiving queue. Since Win32 threads are expensive, it would be nice to make one thread working with several sockets at once, a la “actors” framework of Erlang/Scala. You’d even be able to move some threads to separate exe’s like qmail designed, to make crashes localized. And in future even across cluster of computers.
“Dumper” would take an isolated queue and ump it into a contiguous non-SQL file (you’d not want separate files due to HDD thrashing data-filesystem-data-filesystem-…), then he’d switch the queues: isolate aforementioned receiving queue for dumping and subsitute it with the queue it just emptied. That “have two queues and switch them” is a common “page flipping” trait used for example in 3D gaming. TForm.DoubleBuffering is similar yet reduced concept. BTW, you also should have two folders with those files, like the in-memory queue above.
“dbkeeper” would similarly take one of the file dumps, move it into RDBMS, and switch.
You would have to set communications between those workers around those switching activities. Each queue is to be either receiving, or dumping, minimizing concurrent accesses (both frequency and longevity).
You may read about mailinator design – its maintainer refactored his software few times after stepping one or another bottleneck, that would also give you an estimation when those bottlenecks start to influence performance.
But really, why not use some ready and tested server instead ?