I have been reading the Paxos paper, the FLP theorem etc. recently and evaluating Apache Zookeeper for a project. I have also been going thru Chubby (Google’s distributed locking service) and the various literature on it that is available online. My fundamental usecase for Zookeeper is to implement replication and general coordination for a distributed system.
I was just wondering though, what is the specific advantage that Zookeeper or a Chubby like distributed locking system brings to the table. Basically I am just wondering why I can’t just use a MySQL NDB Cluster. I keep hearing that MySQL has a lot of replication issues. I was hoping some with more experience on the subject might shed some light on it.
Thanks in advance..
A simplistic listing of my requirements :
- I have a homogeneous distributed system.
- I need some means of maintaining consistent state across all my nodes.
- My system exposes a service, and interaction with clients will lead to some change in collective state of my system.
- High availability is a goal, thus a node going down must not affect the service.
- I expect the system to service atleast a couple of 1000 req/sec.
- I expect the collective state of the system to be bounded in size (basically inserts/deletes will be transient… but in steady state, i expect lots of updates and reads)
It depends on the kind of data you are managing and the scale and fault tolerance you are going for.
I can answer from the ZooKeeper point of view. Before starting I should mention that ZooKeeper is not a Chubby clone. Specifically it does not do locks directly. It is also designed with different ordering and performance requirements in mind.
In ZooKeeper the entire copy of system state is memory resident. Changes are replicated using an atomic broadcast protocol and synced to disk (using a change journal) by a majority of ZooKeeper servers before being processed. Because of this ZooKeeper has deterministic performance that can tolerate failures as long as a majority of servers are up. Even with a big outage, such as a power failure, as long as a majority of servers come back on line, system state will be preserved. The information stored is ZooKeeper is usually considered the ground truth of the system so such consistency and durability guarantees are very important.
The other things that ZooKeeper gives you have to do with monitoring dynamic coordination state. Ephemeral nodes allow you do to easy failure detection and group membership. The ordering guarantees allow you to do leader election and client side locking. Finally, watches allow you to monitor system state and quickly respond to changes in system state.
So if you need to manage and respond to dynamic configuration, detect failures, elect leaders, etc. ZooKeeper is what you are looking for. If you need to store lots of data or you need a relational model for that data, MySQL is a much better option.