Suppose I want to store a complicated data structure (a tree, say) to disk. The internal pointers which connect nodes in my data structures are pointers, but I can’t just write these pointers to disk, because when I read the data structure back the memory locations will have changed.
So what is the right way to store the pointers on disk? Is the answer as simple as (File, Offset), or is there something that I’m missing? I can intuit how pointers might be converted to (File, offset) pairs, and back again, but are there some subtleties that I should watch out for?
Edit: I should mention that I’m especially interested in how a database would do this internally, for a b-tree. I probably made the question more general than I should have, though I do appreciate the XML-based answers.
Your intutuion about (file, offset) pairs is correct.
An important thing to watch out for when storing data on disks is that, disks are slow. So, there are special data structures which have been designed to store “searchable” data on disks. Accessing nodes of a binary search tree stored on disks using (file, offset) pointer would be orders of magnitude slower than accessing them in memory.
If speed of access is important, you’d want to store things which are expected to accessed together, closer together on disks. A couple of data structures used for this are B-tree and B+ tree. Look these up, to find out how to use them. There are complicated caching algorithms used by several applications such as databases, to cache things in memory, so that apps do not need to go to disk to retrieve stuff again and again.
If speed of access is not important, then simply “serializing” data on disk in the form of XML as suggested by Aiden and Darren is good enough.
Edit: If you need more details about how databases store data on disk, you’d need to learn more about database theory. I’d suggest reading up a good book on databases, so that you understand the requirements that drive the disk format. Note that I am mostly referring to relational databases here, but there are other breeds of databases, which have completely different requirements and hence different disk formats. Starting with relational databases is a good thing to do though, since they are most commonly used.
In short a few things that affect relational database disk format are:
Query optimization is an important branch of database theory to optimize disk accesses, for satisfying a query. Hopefully, this will get you started in the right direction.