I know this is a very vague question, but I’m looking for a very abstract answer. Since I started using GAE a few months ago, I’ve always shrugged off memcache as not useful and an uncessessary hassle, that it’s really not that important. But it seems that memcache is praised as a highly beneficial feature, that Google even says “High performance scalable web applications often use a distributed in-memory data cache in front of or in place of robust persistent storage for some tasks.”, and so I thought that there must be something worth looking into about this.
I just don’t get it. How is it good for performance? First you have to check if something is in memcache, and if not, query. I always thought it would just be quicker to not have to deal with that, and just query anyway, but it seems this may be a naive approach? How much of a difference does this make?
I guess what I never understood is where memcache is useful. I can see how it can be useful in say the Stackoverflow home page, where all users pretty much see the same thing, so it would be useful, in fact silly not to use memcache in a situation like that. But say a social network like Facebook. Every user sees something different. No two people see the same data and content, and things change so fast that memcache would probably constantly need to be updated. What role can memcache play in such a scenario?
Also, in a private website like a social network, how much can memcache really fit if every user has to store different information in memcache? I know GAE doesn’t speak of the size of its memcache, so would it be safe to store hundreds of thousands of records?
You use memcache for things you’re likely to need frequently. Checking to see if something is in memcache is fast. Getting it from memcache is fast. If you can save yourself having to do a query, you save a ton of time.
For instance, consider the following two scenarios:
Ballpark time to get something from memcache (or store it) is on the order of ~2-3ms. Ballpark time to get something from the DB is 100ms.
So for the first scenario, you have 100ms x2 = 200ms total.
In the second scenario, you have 3ms (failed lookup) + 100ms (query) + 3ms (store) + 3ms (successful lookup) = 109ms total.
You’ve saved almost 50% overall.
Now consider that maybe 10 people request your homepage. Each additional person in the first scenario would be another 100ms. Each person in the second scenario is only 3ms.
Also note that you don’t have to store an entire page at a time in memcache. You can store parts of pages as well. Sure, not all users have all the same data, but there are certainly some things that are shared between them.