My goal is to create a data structure implementing IList<T> interface which will achieve O(1) element lookup time by compromising memory.
Background
As you know all array based IList<T> implementations like List<T> have O(n) element lookup time. It means that the operations like int IndexOf(T element) or bool Contains(T element) iterate through the underlaying array until they find a match.
Well known idea achive that is to use a combination of a list and hashtable as underlaying data structures. Values are kept in a list. The hashtable will keep indexes as values and values of list as keys. So lookup can be performed using hashtable.
That’s exactly how the KeyedCollection<TKey, TItem> see MSDN is implemented.
What I have tried so far
internal class MyList<T> : KeyedCollection<T, T>
{
protected override T GetKeyForItem(T item)
{
return item;
}
}
This worked so far except one problem. This data structure does not mimic the behavior expected behind List<T> exactly. The point is that the List<T> allows duplicates, MyList does not.
Question
Is there any ready to use data structure or can you recommend an elegant way of implementing the IList<T> so that:
- Lookup operations have
O(1)time. - All other operations have same
O()performance asList<T> - Memory can be compromised by hashtable overhead (
constantA + constantB * nbytes). - Duplicates must be allowed
- Allowing nulls is optional (they can be boxed into null objects)
Building on what
Ryan Bennetthas proposed, I think the best you are going to come up with (as you state order is important) is to create a class that implements IList and then internally has something like this:Edit:
Just realized that things get real sticky here when you do a
Remove. You will have to walk the collection of indices and update any index with a value > the index of the item you remove. You have now increased the “remove” time. You have also made it tricky to get correct. I would throw a massive amount of unit tests around this collection if you were going to try to implement something like this.I know you are stating order is important so I am assuming that is why you are not going wtih a Sorted list approach which would allow the duplicates and give you O(log n) operation times.
Edit 2: Another book keeping type approach
I am just bouncing this one around in my head and so I will only give some rough pseudo code but you could possibly take an approach where you just have a dictionary of items mapped to a list of indices and a second dictionary that maps indices to items. If you add the restriction that T is a class then you are only paying the overhead of two storages of the reference. You then need to maintain a current “last” so that you can easily add a new item into the collection. This should make the remove operation a bit cleaner. It is still O(n) because you have to update anything with an index > the removed item. In first imaginings, this seems like a potential solution that will get you close to what you want to achieve (if I am understanding the goals correctly).