I am currently researching what database to use for a project I am working on. Hopefully you guys can give me some hints.
The project is an automated web crawler that checks websites as per a user’s request, scrapes data under certain circumstances, and creates log files of what was done.
Requirements:
- Only few tables with few columns; predefining columns is no problem
- No overly complex associations between models
- Huge amount of date & time based queries
- Due to logging, database will grow rapidly and use up a lot of space
- Should be able to scale over multiple servers
- Fields contain mostly ids (int), strings (around 200-500 characters max), and unix timestamps
- Two different types of servers will simultaneously read/write data directly to/from it:
- One(/later more) rails app that takes user input and displays results upon request
- One(/later more) Node.js server that functions as the executing crawler/scraper. It will have enough load to run continuously and make dozens of database queries every second.
I assume it will neither be a graph database (no complex associations), nor a memory based key/value store (too much data to hold in cached). I’m still on the fence for every other type of database I could find, each seems to have it’s merits.
So, any advice from the pros how I should decide?
I would agree with Vladimir that you would want to consider a document-based database for this scenario. I am most familiar with MongoDB. My reasons for using it here are as follows: