Currently we are developing quite a big application which will have to work with some huge amounts of records.
The idea is that the e-mails will have to be stored (with attachments) and via a web-api users should be able to search in their stored e-mails. Users should be able to search (within their own messages they have exported into the database/storage) on at least the following items:
- from
- to
- subject
- date (range)
- attachments (names & types only)
- message contents
- (optional) mailbox / folder structure
The application should be able to work with big numbers of users and extreme numbers of e-mails (easily growing from millions to billions). The users should be able to download the whole originals message (with attachments) so they can import it into their email client.
I was thinking about indexing the e-mails into a database, and just storing the full e-mail with attachments with a unique key as a package into a seperate storage. With this way I should keep the database load as low as possible and therefore the search as quick as possible.
I have found several database schemas for handling e-mail like this. I couldn’t find any database that is able to handle with hundreds of millions and maybe even billions of records (e-mails).
Is this the most efficient way to keep it simple, efficient and fast or am I forgetting anything?
// edit
The idea is to run this on the amazon cloud (perhaps any suggestions related to it?)
You can use mongoDB database for this amount of data.
Here is detail of mongoDb.http://www.mongodb.org/
In mongoDb mysql table is called as collections and row as document.
Mongo store data in JSON based object format.
one possible way to make db schema here.