Memcache is one of those things where the solution could be absolutely anything, and no one ever really gives a decent answer, maybe because there is none. So I’m not looking for a direct answer, but maybe just something to get me going in the right direction.
For a typical request, here is my AppStats info:

So, out of a total 440 ms request, I spend 342 ms in memcache. And here I figured memcache was supposed to be lightning fast. I must be doing something wrong.
Looking at my memcache statistics in my admin console, I have this:
Hit count: 3848
Miss count: 21382
Hit ratio: 15%
I’m no expert on this stuff, but I’m pretty sure 15% is terrible.
The typical request above is a bit too detailed to explain, but basically, I create and put a new entity, which also updates and puts a parent entity, which also updates and puts any users associated with the parent entity.
Throughout all this, I always get by key, never query. Oh and I’m using NDB, so all the basic memcache stuff is handled automatically. So I never actually touch memcache manually on my own in my code.
Any ideas?
Edit: Here is the breakdown of my request

So I only have 2 datastore gets and 2 puts. The rest is automatically handled memcache stuff. Why is it doing so much work? Would I be better off handling this stuff manually?
Let’s take a closer look at your data. Seven memcache writes took as much time as two datastore writes. This actually proves that memcache is, like, 3.5 times faster than Datastore.
If a typical request to your application requires updates of at least three database entities–followed by an update of more entities (the users associated), you can’t make this operation “lightning fast.” Memcache helps when you read entries much more frequently than you write them. If the amount of reads and writes to a User’s record are on par, you should consider turning cache off for this model.
You can also try asynchronous operations and task queues. From your description, it looks like you try to first update the entity, and update its parent only after the update completes because it’s natural. You may run these concurrently; this probably will require some refactoring, but it’s worth it.
Second, updating “all the associated users” may be, perhaps. deferred to a task spawned in background; Task Queues have a very convenient interface for this. The “associated users” won’t be updated immediately, but they probably don’t need to! However, the latency of your request will be less then.