I couldn’t find a good title for the question, this is what I’m trying to do:
- This is .NET application.
- I need to store up to 200000 objects (between 3KB-500KB)
- I need to store about 10 of them per second from multiple-threads
- I use binaryserialization before storing it
- I need to access them later on by an integer, unique id
What’s the best way to do this?
- I can’t keep them on memory as I’ll get outofmemory exceptions
- When I store them in the disk as separate files what are the possible performance issues? Would it decrease the overall performance much?
- Shall I implement some sort of caching, for example combine 100 objects and write it once as one file. Then parse them later on. Or something similar?
- Shall use a database? (access time is not important, there won’t be search and I’ll access only couple of times by the known unique id). In theory I don’t need a database, I don’t want to complicate this.
UPDATE:
- I assume database would be slower than file system, prove me wrong if you got something about that. So that’s why I’m also leaning towards to file system. But what I’m truly worried is about writing 200KB*10 per second to HDD (this can be any HDD, I don’t control hardware, it’s a desktop tool which will be deployed in different systems).
- If I use file system I’ll store files in separate folders to avoid file-system related issues (so you can ignore that limitation)
If you want to avoid using a database, you can store them as files on disk (to keep things simple). But you need to be aware of filesystem considerations when maintaining a large number of files in a single directory.
A lot of common filesystems maintain their files per directory in some kind of sequential list (e.g., simply storing file pointers or inodes one after the other, or in linked lists.) This makes opening files that are located in the bottom of the list really slow.
A good solution is to limit your directory to a small number of nodes (say n = 1000), and create a tree of files under the directory.
So instead of storing files as:
/dir/file1 /dir/file2 /dir/file3 … /dir/fileN
Store them as:
/dir/r1/s2/file1 /dir/r1/s2/file2 … /dir/rM/sN/fileP
By splitting up your files this way, you improve access time significantly across most file systems.
(Note that there are some new filesystems that represent nodes in trees or other forms of indexing. This technique will work as well on those too.)
Other considerations are tuning your filesystem (block sizes, partitioning etc.) and your buffer cache such that you get good locality of data. Depending on your OS and filesystem, there are many ways to do this – you’ll probably need to look them up.
Alternatively, if this doesn’t cut it, you can use some kind of embedded database like SQLlite or Firebird.
HTH.