I need to make a list of key-value pairs (similar to std::map<std::string, std::string>) that is stored on disk, can be accessed by multiple threads at once. keys can be added or removed, values can be changed, keys are unique. Supposedly the whole thing might not fit into memory at once, so updates to the map must be saved to the disk.
The problem is that I’m not sure how to approach this problem. I understand how to deal with multithreading issues, but I’m not sure which data structure is suitable for storing data on disk. Pretty much anything I can think of can dramatically change structure and cause massive overwrite of the disk storage, if I approach problem head-on. On other hand, relational databases and windows registry deal with this problem, so there must be a way to approach it.
Is there a data structure that is “made” for such scenario?
Or do I simply use any traditional data structure(trees or skip lists, for example) and make some kind of “memory manager” (disk-backed “heap”) that allocates chunks of disk space, loads them into memory on request and unloads them onto disk, when necessary? I can imagine how to write such “disk-based heap”, but that solution isn’t very elegant, especially when you add multi-threading to the picture.
Ideas?
The data structure that is “made” for your scenario is B-tree or its variants, like B+ tree.