I’m looking to build a scalable database solution for the back end of my website. I’ve been reading about database design lately, and I seem to have developed an idea on my own that might work. I think this is a novel way of maintaining n databases with synchronized data, but I could be wrong. So I’m asking SO to evaluate the idea and tell me if it’s crazy or not. (or if it already exists and is implemented)
In this scheme there are a group of server nodes. One node runs a query load balancer (Let’s call it A) and the rest are running a typical dbms, let’s call those nodes N collectively.
Each N is disconnected from the others. ie) a node in N doesn’t need to communicate with any of the others. Each N has a connection to A only.
The process works like this
- All database queries are passed through A. (Let’s assume for now A has infinite throughput and processing ability)
- A inspects each query (Q) and determines if it is an operation that will read from a database or a query that will write to a database. (in sql, read would be select and write would be update)
- If Q is a read operation, forward it to one of the nodes in N
- if Q is a write operation, forward it to all of the nodes in N
Assuming it’s implemented properly, this results in all of the nodes in N having synchronized database content. Queries that are only reading data need to be sent to one node.
This idea seems to work especially well for me because in my system there are very few write operations, less than 1%.
So a few questions on this idea
- Does a scheme like this make sense from a theoretical point of view?
- If this does make sense, is there an already implemented solution either commercial or free?
The typical setup for many reads few writes is to have a read/write master db and n replicated slave dbs which are read only. Replication is handled by the RBDMS. Read only queries can be load balanced across all your n read only nodes and if your read/write master goes down temporarily, at least your app will be able to service read operations. You don’t need a central “A” proxy to decide if a query is a read or a write. The client issuing the query should be smart enough to know if it’s reading or writing. That way you won’t be bottlenecked on your “A” server.
Your proposed setup has the distinct flaw in that if you are simultaneously writing to n nodes, what if one or more of those writes fails?