I ran across a comment that made me wonder: If you use a sharding approach to db scalability, does that mean the database is denormalized? Can you have a normalized, sharded database?
Share
Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.
Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
The are not mutually exclusive. Both are often used when scaling massive datasets, but one doesn’t really have much to do with the other. You can absolutely have a sharded, normalized database…or a denormalized, nonsharded database.
In sharding, you’re just taking a given schema (normalized or not) and distributing it across a number of physical/logical data stores. This allows, for example, you to have all your users with a particular characteristic (e.g., last name in ‘A-D’) to live on a given database instance. Note that HOW you shard your database is a crucial decision and will be highly implementation dependent.
Denormalization, on the other hand, can be done with or without a sharded database and is intended to simply queries by reducing the joins/subqueries needed to answer a particular question. Of course, then you would typically programmatically maintain data integrity.
Some good reading:
Sharding theory & practice
Some highly-scalable database implementations ‘in the wild’