I’ll try to start with the real life example in my case:
I have a website with around 60K page views per day (at peak time, 100 active visitors on site, due to google realtime analytics). Django+mysql+apache. 1 Linode server with 1024M RAM and 4core CPU, and amazon S3 service to store static files. It turns out to me now it reaches its bottleneck (peak time it response slow).
One thing I could see is from the memory, nearly 91% of memory being consumed at peak time. But not sure if there are some other bottlenecks.
2 questions:
1: Where: If I want to solve this problem, I need to know exactly where the problem sits in. So I’m wondering how could I measure where the bottleneck is?
2: How: How could I solve the memory bottleneck? One way I could think of is to simply add more memories, or add more machines using the load balancers (sure it’s more expensive…). Another way, not sure, but possibly change from apache to nginx?
EDIT:
You could also see the memory usage generated from the top command here, 30 apache instances with each consuming 2 percent (20M) of the memory.
http://codepad.org/pUYdZhWq
Without knowing a lot more about how your site behaves it’s hard to make recommendations. And the list of questions that would need to be answered is huge. Put simply, if you were to pay a competent person to investigate the issue it would cost you an awful lot more than putting another gig of memory in the server.
OTOH, if I were the person being paid to resolve the problem, one of the first things I’d be looking at is the speed of the server – turning requests around faster means fewer requests resident in memory, means less memory used by the webserver (and more for cache/buffers). So start measuring your response times (%D) and analysing the data. Make sure you’re using an opcode cache, and enable output compression for your PHP (ob_gzhandler()) and enable compression for CSS javascript and html files on the webserver. Make sure you’ve stripped all unused modules from Apache and are using a sensible value for the keepalive timeout (2 seconds or less).
If you’re running any COMET stuff then, yes you should definitely switch to nginx, otherwise the benefits are limited.