Situation
Users can upload Documents, a queue message will be placed onto the queue with the documents ID. The Worker Role will pick this up and get the document. Parse it completely with Lucene. After the parsing is complete the Lucene IndexSearcher on the Webrole should be updated.
On the Web role I’m keeping a static Lucene IndexSearcher because otherwise you have to make a new IndexSearch every search request and this gives a lot of overhead etc.
What I want do to is send a notice from the Worker Role to the Web Role that he needs to update his IndexSearcher.
Possible Solutions
- Make some sort of notice queue. The Web Role starts an endless task that keeps checking the notice queue. If he finds a message then he should update the IndexSearch.
- Start a WCF Service on the Worker Role and connect with the Web Role. Do a callback from the Worker Role and tell the Web Role through the Service that he needs to update his IndexSearcher.
- Just update it on a regular interval
What would be the best solution or is there any other solution for this?
Many thanks !
If your worker roles write each finished job’s details to a table using a PK of something like
(DateTime.MaxValue - DateTime.UtcNow).Ticks.ToString("d19"), you will have a sorted list of the latest jobs that have been processed. Set your web role to poll the table like so:For worker roles that do the indexing work, this is great because they can write indiscriminately to the table without worry of conflict. For you, you also have an audit log of the jobs they are processing (assuming you put some details in there).
However, you have one remaining problem: it sounds like you have 1 web role that updates the index. This one web role can of course poll this table on whatever frequency you choose (just track the LastIndexTime for searching later). Your issue is how to control concurrency of the web role(s) if you have more than one. Does each web role maintain it’s own index or do you have one stored somewhere for all? Sorry, but I am not an expert in Lucene if that should be obvious.
Anyhow, if you have multiple instances in your WebRole and a single index that all can see, you need to prevent multiple roles from updating the index over and over. You can do this through leasing the index (if stored in blob storage).
Update based on comment:
If each WebRole instance has its own index, then you don’t have to worry about leasing. That is only if they are sharing a blob resource together. So, this technique should work fine as-is and your only potential obstacle is that the polling intervals for the web roles could be slightly out of sync, causing somewhat different results until all update (depending on which instance you hit). Poll every 30 seconds on the table and that will be your max out of sync. Each web role instance simply needs to track the last time it updated and do incremental searches from that point.