My app receiving approximately 2000 string messages per second, each message is about 300 characters long.
I need to store all messages in a DB. I’m using SQL Express 2008, and .NET.
I’m thinking of holding all data in memory till it reaches a certain limit (10000 messages = 5 seconds , for example), and then write it down all at once.
This way the data will be written to the hard drive every 5 seconds, instead of every second.
Does my approach is good enough? What approach should I use in order to achieve the following results?
- messages are not piling up in memory
- Hard drive won’t commit suicide 🙂
Note: there is no need to parse the strings, the only thing is to store them by the order they arrived.
A quick calculation indicates that you may experience up to 50 GB of data per day. If there is no SQL specific processing to be done on this data then it doesn’t seem feasible to store it in a database.
The next solution would be files on the disk and since you deal with simple text (not binary) then perhaps a quick compression would also help. However since the files would be so small (300 bytes), compression would not yield any sensible results. The data would need to be grouped in larger files, for instance one piece of data per line and one such file per day. This file would be sufficiently large so that compression would give satisfactory results if the disk space would become an issue.
If the space is not an issue and/or frequent processing of this data or even simultaneous processing of data from different days is to be expected then one piece of data per file would be a better choice. This solution, in turn, will bring the issue of having a very large number of files inside a folder which will not only bump against file system limitations but also create performance issues when working with these files, and these issues will affect the entire machine performance.
Storing and accessing a large number of files in a better manner is to use a partitioned folder storage. That is each file would have to have a unique name and will then be placed in a specific folder hierarchy according to its name. This approach has several advantages:
Sample partitioning:
yyyymmddhhss-<counter>.txt(e.g.:201104252345-1.txt,201104252345-2.txt, etc)\yyyy\mm\dd\oryyyy\mm\dd\hh\etc (as many levels as the solution would need to keep the number of files manageable)201104252345-1.txtbeing stored as2011\04\25\201104252345-1.txt, etc