I abstracted out a sorted list that I need to keep in C. One way is best for reading and the other for writing.
WRITE: search KeyNumeric then KeyAlpha and write *Data
Key1 : [ KeyA, *Data1A, KeyB, *Data1B, KeyC, *Data1C ]
Key2 : [ KeyA, *Data2A, KeyB, *Data2B, KeyC, *Data2C ]
Key3 : [ KeyA, *Data3A, KeyB, *Data3B, KeyC, *Data3C ]
READ: search KeyAlpha then KeyNumeric and read *Data
KeyA : [ Key1, *Data1A, Key2, *Data2A, Key3, *Data3A ]
KeyB : [ Key1, *Data1B, Key2, *Data2B, Key3, *Data3B ]
KeyC : [ Key1, *Data1C, Key2, *Data2C, Key3, *Data3C ]
Does anyone recognize what would be the most efficient way to represent this data structure in memory?
If I understand correctly:
I’m also going to assume that the data keys are sparse, so a straight “[N][A]” array is not going to work for you.
Since you want the data to be double indexed, I’d suggest that you need some kind of linked structure: either a list or a tree.
To do it with linked lists, your C structure might look like this:
So, if you have data items
1A, 1B, 1C, 2A, 2B, 2C, 3A, 3B, 3Cthese links would work like this:1A num_list.next_numpoints to2A.1A num_list.next_alphapoints to1B.1A num_alpha.next_alphapoints to1B.1A num_alpha.next_numpoints to2A.2B num_list.next_numisNULL.2B num_list.next_alphapoints to2C.2B num_alpha.next_alphaisNULL.2B num_alpha.next_numpoints to3B.So, in words,
num_list.next_numalways points to something with the next number, but the first letter available. Similarly,alpha_list.next_alphaalways points to something with the next letter, but the first number available. If you’re not looking at the head of the secondary list then pointer for the primary list isNULLbecause you never want to traverse the data that way, and maintaining a real pointer there would either cause bugs, or cause extra maintenance on insert or delete.You can think of it as two lists of lists:
num_list.next_numis a list of the heads of thenum_list.next_alphalists.aplha_list.next_alphais a list of the heads of thealpha_list.next_numlists.To find an item, you first move across one of the primary lists,
num_list.next_numoraplha_list.next_alpha, and then down one of the secondary lists,num_list.next_alphaornum_alpha.next_num.So, clearly there are some efficiency issues with this:
If you are dealing with large quantities of data I would do two things:
Use some kind of balanced tree instead of flat lists. The ‘heads of the lists’ then becomes the ‘roots of the trees’.
Allocated a fixed-sized array of
struct stuffand use array indexes as the links, instead of pointers. Then simply maintain a “free list” of unused slots. If your data out-grows the array then usereallocor allocate a second memory block and remember which indexes lie in which block.