Trying a mental reset here: I tried to create a reliable, persistent stack with MSMQ, didn’t work
So in more general terms:
I have producer (a webservice, so multithreaded although “only one”) / consumer (multiple processes, as many as needed) setup. The key problems are
– The data needs to be consumed/processed in LIFO order (~> stack)
– The data needs to be stored/handled in a reliable way (i.e. backed by a disk, message queue, whatever). Bonus points for transaction support.
– Interprocess communication involved
Given the points above I struggle to find a neat solution. What I looked at:
-
Do it yourself
Didn’t really plan to do that, but initial proof of concepts for this just confirmed that this is hard (for me) and helped me get a better grasp on the many hurdles involved. -
MSMQ
Would be nice and easy, since it lends itself easily to “reliable”, is easy to set up and already part of the target infrastructure. Unfortunately “LIFO”/”Stack” is a killer here. That seems to be impossible to do -> Bzzzt. -
Database (SQL Server)
I tried to look at a DB based approach, but there are lots of ugly things involved:- I’d need to store my data as a blob (since it doesn’t lend itself to a column based store easily)
- Polling a database for work just seems wrong (is it?)
- Locking with multiple consumers seems to be tricky..
Any suggestion for a technology that I should evaluate? The database based approach seems to be the most “promising” so far, but I still haven’t found good examples/success stories of similar usecases.
Updates
- Windows only
- For now, I don’t even need to do inter-machine communication (i.e. producer/consumer probably will be on one machine for now)
- The key part in the question, the difficult task for me is: I cannot lose a job/message, even if all processes go down. A DB would give me that “for free”, message queues can be set up to be reliable. Map/reduce, while interesting, isn’t solving the core issue: How do I make sure that messages/jobs aren’t lost?
I’d go with SQL Server for this.
Obviously you’d have to serialize your data to a blob, but any solution would have to do this (at least behind the scenes). You would then just have a table like
CREATE TABLE Stack (Id int identity, Data varbinary(MAX))Polling the database isn’t necessary. SQL Server has a query notification service where you just give it a query and it will notify you when the results would be different. Your notification query would just be
SELECT * FROM StackLocking is the database’s problem, not yours. You would just have every consumer run a query (or stored procedure) that uses a transaction to return the most recent entry (the row with the highest Id) and delete it at the same time. If the query returns a result, process it and run it again. If the query returns no results, see #2.
Here’s a sample query:
Here’s a more elegant version that doesn’t even require an explicit transaction:
If you want to do batch processing of 10 items at a time, you would use SQL like this: