I’m trying to figure out a design pattern that was mentioned in an erlang talk.
Essentially the speaker mentions using a work queue using a “message as a process” rather then using the job as a process.
The key idea being that by using a “message as a process” you are able to save serialization/deserialization overhead.
Thanks
Let
Mbe an Erlang term() which is a message we send around in the system. One obvious way to handleMis to build a pipeline of processes and queues.Mis processed by the first worker in the pipeline and then sent on to the next queue. It is then picked up by the next worker process, processed again and put into a queue. And so on until the message has been fully processed.The perhaps not-so-obvious way is to define a process
Pand then handMtoP. We will notate it asP(M). Now the message itself is a process and not a piece of data.Pwill do the same job that the workers did in the queue-solution but it won’t have to pay the overhead of sticking theMback into queues and pick it off again and so on. When the processingP(M)is done, the process will simply end its life. If handed another messageM'we will simply spawnP(M')and let it handle that message concurrently. If we get a set of processes, we will do[P(M) || M <- Set]and so on.If
Pneeds to do synchronization or messaging, it can do so without having to “impersonate” the message, since it is the message. Contrast with the worker-queue approach where a worker has to take responsibility for a message that comes along it. IfPhas an error, only the messageP(M)affected by the error will crash. Again, contrast with the worker-queue approach where a crash in the pipeline may affect other messages (mostly if the pipeline is badly designed).So the trick in conclusion: Turn a message into a process that becomes the message.
The idiom is ‘One Process per Message’ and is quite common in Erlang. The price and overhead of making a new process is low enough that this works. You may want some kind of overload protection should you use the idea however. The reason is that you probably want to put a limit to the amount of concurrent requests so you control the load of the system rather than blindly let it destroy your servers. One such implementation is Jobs, created by Erlang Solutions, see
https://github.com/esl/jobs
and Ulf Wiger is presenting it at:
http://www.erlang-factory.com/conference/ErlangFactoryLiteLA/speakers/UlfWiger
As Ulf hints in the talk, we will usually do some preprocessing outside
Pto parse the message and internalize it to the Erlang system. But as soon as possible we will make the messageMinto a job by wrapping it in a process (P(M)). Thus we get the benefits of the Erlang Scheduler right away.There is another important ramification of this choice: If processing takes a long time for a message, then the preemptive scheduler of Erlang will ensure that messages with less processing needs still get handled quickly. If you have a limited amount of worker queues, you may end up with many of them being clogged, hampering the throughput of the system.