I’ve been looking at high availability solutions such as heartbeat, and keepalived to failover

Question

0

Asked: June 7, 20262026-06-07T00:05:34+00:00 2026-06-07T00:05:34+00:00

I’ve been looking at high availability solutions such as heartbeat, and keepalived to failover

0

I’ve been looking at high availability solutions such as heartbeat, and keepalived to failover when an haproxy load balancer goes down. I realised that although we would like high availability it’s not really a requirement at this point in time to do it to the extent of the expenditure on having 2 load balancer instances running at any one time so that we get instant failover (particularly as one lb is going to be redundant in our setup).

My alternate solution is to fire up a new load balancer EC2 instance from an AMI if the current load balancer has stopped working and associate it to the elastic ip that our domain name points to. This should ensure that downtime is limited to the time it takes to fire up the new instance and associate the elastic ip, which given our current circumstance seems like a reasonably cost effective solution to high availability, particularly as we can easily do it multi-av zone. I am looking to do this using the following steps:

Prepare an AMI of the load balancer
Fire up a single ec2 instance acting as the load balancer and assign the Elastic IP to it
Have a micro server ping the current load balancer at regular intervals (we always have an extra micro server running anyway)
If the ping times out, fire up a new EC2 instance using the load balancer AMI
Associate the elastic ip to the new instance
Shut down the old load balancer instance
Repeat step 3 onwards with the new instance

I know how to run the commands in my script to start up and shut down EC2 instances, associate the elastic IP address to an instance, and ping the server.

My question is what would be a suitable ping here? Would a standard ping suffice at regular intervals, and what would be a good interval? Or is this a rather simplistic approach and there is a smarter health check that I should be doing?

Also if anyone foresees any problems with this approach please feel free to comment

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-07T00:05:36+00:00

I understand exactly where you’re coming from, my company is in the same position. We care about having a highly available fault tolerant system however the overhead cost simply isn’t viable for the traffic we get.

One problem I have with your solution is that you’re assuming the micro instance and load balancer wont both die at the same time. With my experience with amazon I can tell you it’s defiantly possible that this could happen, however unlikely, its possible that whatever causes your load balancer to die also takes down the micro instance.
Another potential problem is you also assume that you will always be able to start another replacement instance during downtime. This is simply not the case, take for example an outage amazon had in their us-east-1 region a few days ago. A power outage caused one of their zones to loose power. When they restored power and began to recover the instances their API’s were not working properly because of the sheer load. During this time it took almost 1 hour before they were available. If an outage like this knocks out your load balancer and you’re unable to start another you’ll be down.

That being said. I find the ELB’s provided by amazon are a better solution for me. I’m not sure what the reasoning is behind using HAProxy but I recommend investigating the ELB’s as they will allow you to do things such as auto-scaling etc.

For each ELB you create amazon creates one load balancer in each zone that has an instance registered. These are still vulnerable to certain problems during severe outages at amazon like the one described above. For example during this downtime I could not add new instances to the load balancers but my current instances ( the ones not affected by the power outage ) were still serving requests.

UPDATE 2013-09-30

Recently we’ve changed our infrastructure to use a combination of ELB and HAProxy. I find that ELB gives the best availability but the fact that it uses DNS load balancing doesn’t work well for my application. So our setup is ELB in front of a 2 node HAProxy cluster. Using this tool HAProxyCloud I created for AWS I can easily add auto scaling groups to the HAProxy servers.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’ve been looking at high availability solutions such as heartbeat, and keepalived to failover

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply