I have a site running on amazon elastic beanstalk with the following traffic pattern:
- ~50 concurrent users normally.
- ~2000 concurrent users for 1/2 minutes when post is made to Facebook page.
Amazon web services claim to be able to rapidly scale to challenges like this but the “Greater than x for more than 1 minute” setup of cloudwatch doesn’t appear to be fast enough for this traffic pattern?
Usually within seconds all the ec2 instances crash, killing all cloudwatch metrics and the whole site is down for 4/6 minutes. So far I’ve yet to find a configuration that works for this senario.
Here is the graph of a smaller event that also killed the site:

The suggestion from AWS was as follows:
So I think the best answer is to run more instances at lower traffic and use custom metrics to predict traffic from an external source. I am going to try, for example, monitoring Facebook and Twitter for posts with links to the site and scaling up straight away.