There are a few threads floating around on the topic, but I think my use-case is somewhat different.
What I want to do:
- Full text search component for my GAE/J app
- The index size is small: 25-50MB or so
- I do not need live updates to the index, a periodic re-indexing is fine
- This is for auto-complete and the like, so it needs to be extremely fast (I get the impression that implementing an inverted index in Datastore introduces considerable latency)
My strategy so far (just planning, haven’t tried implementing anything yet):
- Use Lucene with RAMDirectory
- A periodic cron job creates the index, serializes it to the Datastore, stores an update id (or timestamp)
- Search servlet loads the index on startup and creates the RAMDirectory
- On each request the servlet checks the current update id and reloads the index as necessary
The main thing I’m fuzzy on is how to synchronize in-memory data between instances – will this work, or am I missing something?
Also, how far can I push it before I start having problems with memory use? I couldn’t find anything on RAM quotas for GAE. (This index is small, but I can think of more stuff I’d like to add)
And, of course, any thoughts on better approaches?
Well, as of GAE 1.5.0 looks like resident Backends can be used to create a search service.
Of course, there’s no free quota for these.