I’ve got the following theoretical problem.
I’d like to be able to create multiple instances of every role in my Azure application. At least 2, to make sure it’s working 24/7 (or nearly 24/7).
This is easy for client front-ends and roles which listen for data.
It’s also easy for worker roles which process data and save results in blobs / table-storage assuming each worker role processes it’s own set of data to create a single result.
But I’m having a problem with the “middle” of this application.
In the simplest case, if I could create an Azure storage “distinct” queue (no two messages are identical), then it would be easy. But as far as I know, this cannot be done, especially since you can download only 32 messages at a time.
So, I’d need some sort of controller role… I think. This role would ensure the worker roles wouldn’t interfere with each other.
BUT, this means the controller role cannot have more than one instance!
How should I approach this problem?
EDIT:
I see I may have given too little information regarding my problem.
I am aware how Azure queues work; I know that once a message is dequeued it won’t be available for other roles for a period of time or until it is deleted.
The thing is, I cannot just populate the queue with random, unique data, like GUIDs.
Here’s a more down-to-earth explanation of the problem.
- Listener roles listen for client data, and place that data into a base table, then notify, via a queue, that a specific part of that table has been altered.
- Listener roles may report that the same part of the table has been modified multiple times – they don’t track what’s going on with the queue, and what’s already in it, so they just send out messages along the lines of “hey, I modified this bit for this client” (for example).
- Workers create or refresh specific files (blobs mostly) based on the messages provided by the listeners. Currently, there’s a single worker, which downloads the entire queue, checks for distinct messages, and then refreshes all the files as needed. Adding a second worker causes problems: since the queue usually DOESN’T contain distinct values, two workers could start working on the same output file, and interrupt each other.
Thus, I’m looking for a queue that is distinct, or at least provides a way to check if a specific message hasn’t already been placed in it.
If the Service Bus is the only way to go, then so be it, but it would be nice to make the solution as simple as possible. 😉
Service Bus queues do have the concept of unique messages (which may be what you mean by distinct). There is nice comparison between the two types of queue here:
http://preps2.wordpress.com/2011/09/17/comparison-of-windows-azure-storage-queues-and-service-bus-queues/
The problem of having a single process in a resilient infrastructure can be solved using locking. You can use blob storage as a lock, as demonstrated by smarx’s blob post:
http://blog.smarx.com/posts/managing-concurrency-in-windows-azure-with-leases
However, I would consider your whole architecture, and try to reduce the number roles you’re using. I set out some arguments for this in a recent blog post:
http://coderead.wordpress.com/2012/03/23/three-tier-architecture-in-the-cloud/