What database would you suggest for a startup that might possibly grow very fast?
To be more specific:
- We are using JSON to interchange data with mobile clients, so the data should be stored ideally in this format
- The data model is relatively simple, like users, categories, history of
actions… - The users interact in “real time” (5 second propagation delay is still OK)
- The queries are known beforehand (can cache results or use mapreduce)
- The system would have up to 10000 concurrent users (just guessing…)
- Transactions are a plus but can live without them I think
- Spatially enabled is a plus
- The data replication between nodes should be easy to administer
- Open source
- Hosting services available (we’d like to outsource the sysadmin part)
We have now a functional private prototype with a standard relational PostgreSQL/PostGIS. But scalability apart questions, I have to convert relational data to JSON and vice versa which seems like an overhead in high load.
I did a little research but I lack experience with all the new NoSQL stuff.
So far, I think of these solutions:
- Couchbase: master-master replication, native JSON document store, spatial extension, couchapps and although I don’t know iriscouch hosting they seem good techs.
The downside I see so far is javascript debugging, disk occupation. - MongoDb: has only one master but safe failover. Uses binary JSON.
- Cluster MySQL: the evergreen of web (one master I think)
- PostgresSQL&Slony: because I just love Postgres:-)
But there are plenty of others, Cassandra, Membase…
Do you guys have some real life experience? The bad one counts too!
Thanks in advance,
Karel
Unless you are already having problems with scaling, you can’t really have a good idea what you actually need for the future. You should be basing your design decisions on what you need now, not when you have your best estimate of customers. Remember, you have to impress your first few customers with how well your product solves their problems before you can worry about impressing your 10,000th
That said, I’ve found that its almost always neccesary to have basically everything:
Not really; the expensive stuff is IO and poorly written queries. The marshalling/unmarshalling is pure CPU, which is about the cheapest thing in the world to grow. Don’t worry about it.