How do i create a database for scalability? I am in the middle of http://www.slideshare.net/vishnu/livejournals-backend-a-history-of-scaling which i cant read ATM and need to leave. But i would like to know more about creating a database that scales well. Somethings that it mentioned and occur in my mind are
- Separate handles for reads and writes?
- What happens when one server is busy (IO or CPU bound) and i need two servers to write to?
- Do i create multiple database? have a clusterId on users?
- Will it be a problem when moving users to one cluster to another?
- Might i code this so user ABC in DB A on cluster A and DEF in DB B in cluster B have the same PRIMARY KEY?
- When i move the above to cluster C? Does this mean i need to write much code to move them to another cluster/database?
- To make the above not an issue would i NOT use PRIMARY KEY and set the ID by hand by reading the other DBs on other clusters?
etc
What happens when one server is busy (IO or CPU bound) and i need two servers to write to?
If you are doing a distributed transaction, well you are in trouble so you have to plan ahead to make sure load across your distributed transaction target servers is uniform.
Do i create multiple database? have a clusterId on users?
This is a very nice solution :P. You have to get the shared-data data models correct so that you don’t form a bottleneck on your shared catalogue’s
Will it be a problem when moving users to one cluster to another?
No, distributed transactions for the win. You need to have a kickass programmer to make sure things happen correctly.
Might i code this so user ABC in DB A on cluster A and DEF in DB B in cluster B have the same PRIMARY KEY?
No, assign the primary key on a master RDBMS/LDAP server. You do not want primary-key collisions of this sort. Your chosen method depends on this being done correctly — you want globally unique user-id’s. You will have shared-data in this case, and if you do not have have GU-PK’s how will you relate the user’s to the shared data ?