I’ve been looking at high availability solutions such as heartbeat, and keepalived to failover when an haproxy load balancer goes down. I realised that although we would like high availability it’s not really a requirement at this point in time to do it to the extent of the expenditure on having 2 load balancer instances running at any one time so that we get instant failover (particularly as one lb is going to be redundant in our setup).
My alternate solution is to fire up a new load balancer EC2 instance from an AMI if the current load balancer has stopped working and associate it to the elastic ip that our domain name points to. This should ensure that downtime is limited to the time it takes to fire up the new instance and associate the elastic ip, which given our current circumstance seems like a reasonably cost effective solution to high availability, particularly as we can easily do it multi-av zone. I am looking to do this using the following steps:
- Prepare an AMI of the load balancer
- Fire up a single ec2 instance acting as the load balancer and assign the Elastic IP to it
- Have a micro server ping the current load balancer at regular intervals (we always have an extra micro server running anyway)
- If the ping times out, fire up a new EC2 instance using the load balancer AMI
- Associate the elastic ip to the new instance
- Shut down the old load balancer instance
- Repeat step 3 onwards with the new instance
I know how to run the commands in my script to start up and shut down EC2 instances, associate the elastic IP address to an instance, and ping the server.
My question is what would be a suitable ping here? Would a standard ping suffice at regular intervals, and what would be a good interval? Or is this a rather simplistic approach and there is a smarter health check that I should be doing?
Also if anyone foresees any problems with this approach please feel free to comment
I understand exactly where you’re coming from, my company is in the same position. We care about having a highly available fault tolerant system however the overhead cost simply isn’t viable for the traffic we get.
That being said. I find the ELB’s provided by amazon are a better solution for me. I’m not sure what the reasoning is behind using HAProxy but I recommend investigating the ELB’s as they will allow you to do things such as auto-scaling etc.
For each ELB you create amazon creates one load balancer in each zone that has an instance registered. These are still vulnerable to certain problems during severe outages at amazon like the one described above. For example during this downtime I could not add new instances to the load balancers but my current instances ( the ones not affected by the power outage ) were still serving requests.
UPDATE 2013-09-30
Recently we’ve changed our infrastructure to use a combination of ELB and HAProxy. I find that ELB gives the best availability but the fact that it uses DNS load balancing doesn’t work well for my application. So our setup is ELB in front of a 2 node HAProxy cluster. Using this tool HAProxyCloud I created for AWS I can easily add auto scaling groups to the HAProxy servers.