I am playing with GAE and made a very compact application that gets tiny requests and outputs tiny responses at a constant rate (from a program that uses cURL and a loop).
It’s not dealing with a UI and it’s not meant to be called from a browser, it’s simply receiving POST requests, doing some processing and outputting light data as ASCII text.
I’ve managed to optimize my app a bit so that the average latency is usually 20-30 ms and it’s been working great so far, a single instance could probably manage easily a dozen of queries per second because of the very low latency.
This morning however, for about 40 minutes, there was a big spike in performance degradation and the application started to take 20,000 to 30,000 ms to handle a request, see here : https://i.stack.imgur.com/AYxmv.png .
The GAE app code was not altered during this time nor the program that makes the requests.
How can I know what was the cause of this and if it will happen again in the future?
I checked in the logs and nothing looked wrong and there is no way to contact Google on this.
My app is very sensitive to latency and all requests should be handled as fast as possible and certainly under 1 second.
I set the Min Pending Latency at 10ms in the Admin panel of my application but is there a way to reduce the max timeout of a request? It is 60 seconds by default I think.
Edit : Here are other charts where we can see what was affeted was ‘API calls CPU’ and ‘Active instances’ but I am not sure how that tells me what went wrong…
Edit2 : Here are some log entries for requests that happened during the problematic period :
69.165.137.199 - - [23/Nov/2011:06:56:11 -0800] "POST / HTTP/1.1" 200 287 - - "app.appspot.com" ms=36378 cpu_ms=258 api_cpu_ms=98 cpm_usd=0.007259 instance=00c61b117cd98a4e8f9d6c0215d5e14c3336
69.165.137.199 - - [23/Nov/2011:06:55:32 -0800] "POST / HTTP/1.1" 200 287 - - "app.appspot.com" ms=34584 cpu_ms=125 api_cpu_ms=98 cpm_usd=0.003555 instance=00c61b117cd98a4e8f9d6c0215d5e14c3336
Here are some things to try: