I have been given the task to design and develop a web application for a NGO (Non Govt Org) which runs primary schools in many towns and villages. The application will keep a record of all the schools, students, volunteers and the teachers of every school. Currently there are about 30 schools in the NGO’s umbrella but they have very ambitious plan to increase the number very rapidly.
We will host the app on Windows Azure using SQL Azure as the database. Now I am facing a tough task on how to design its database with minimum expenditure (as the NGO is funded completely by charities and donations). As you might know that the databases in SQL Azure is offered in specific sizes like 5,10,20 to 50 GB, it puts a restriction on the maximum size of each database. I have came out with following approaches:
1) For every school create a separate database of 5 or 10 GB size. Each database will have tables like ‘student’,’subject’,’attendance’ etc. The problem in this approach is that a lot of databases will have to be created. One for every school. This would drastically shoot up the cost. Also initially a large portion of 10 GB size will be under-utilized, but in future it may happen that 10GB would seem less for storing a school’s data.
2) Keep a single database with tables like ‘school’,’student’,’attendance’ etc. This would keep the cost low initially but during course of time the database would start filling up and may reach the max limit of 50GB as more schools are opened by the NGO. Also a single table for ‘student’ and especially ‘attendance’ will have huge number of records and would make the queries slow. Even if we add another database in future then how easy would it be to split the tables across several databases.
Keeping the limitations in mind we are unable to proceed further.
Any approach or suggestion by you will be very helpful for us.
Thanks in advance.
EDIT: Thanks a lot to the people answered my question. i got the point: 5O GB is a huge space and it would not get filled any time soon. But that brings me a question: Consider a situation when the number of schools grow up to 200, 300 or 1000 !! Then how should be my database design ? I suppose 50 GB would not be big at that situation.
I used to work for a company that makes school systems; although 50GB would be considered large for most of them, a few had databases that were much larger. Historical records are typically the issue here, specially if you will add additional features over time, such as lead import.
You described two scenarios: a linear shard and a scale up architecture. The linear shard implements a database per school. The scale up puts them all in the same database. There are additional options to consider with SQL Azure. See one of my blob posts about a white paper I published regarding various scalability models: http://geekswithblogs.net/hroggero/archive/2010/12/23/multitenant-design-for-sql-azure-white-paper-available.aspx
Also SQL Azure announced an upcoming feature called Data Federation. This is most likely for you. Here are two blog posts you may find relevant:
http://geekswithblogs.net/hroggero/archive/2011/07/23/preparing-for-data-federation-in-sql-azure.aspx
http://geekswithblogs.net/hroggero/archive/2011/09/07/sharding-library-for-sql-azure-data-federation.aspx
The last link discusses an open-source library, called the Enzo Shard, that I am building to assist developers in taking advantage of the future capabilities of SQL Azure Data Federation. The version that supports data federation is in Beta and allows parallel queries to be performed across federation members (i.e. databases).
Finally, don’t miss the posts by Cihan (from Microsoft) that discusses this feature in greater details: http://blogs.msdn.com/b/cbiyikoglu/
In summary, the field of scalability in SQL Azure is evolving. However many capabilities will be coming providing significant data growth and performance oppotunities.