I’m looking for an ordered data structure which allows very fast insertion. That’s the only property required. Data will only be accessed and deleted from the top element.
To be more precised, i need 2 structures :
1) The first structure should allow an ordered insertion using an int value. On completing the insertion, it shall report the rank of the inserted element.
2) The second structure should allow insertion at a specified rank.
The number of elements to be stored is likely to be in thousands, or tens of thousands.
[edit] i must amend the volume hypothesis : even though, at any moment, the size of the ordered structure is likely to be in the range of tens of thousands, the total number of insertion is likely to be in the tens of millions per run.
Insertion time in O(1) would be nice, although O(log(log(n))) is very acceptable too. Currently i’ve got some interesting candidate for First structure only, but either in log(n), or without the capability to report insertion rank (which is mandatory).
What about a form of skip-list, specifically the ” indexed skiplist” in the linked article. That should give O(lg N) insert and lookup, and O(1) access to the first node for both your use cases.
–Edit–
When I think of O(1) algorithms, I think of radix-based methods. Here is an O(1) insert with rank returned. The idea is to break the key up into nibbles, and keep count of all the inserted items which have that prefix. Unfortunately, the the constant is high (<=64 dereferences and additions), and the storage is O(2 x 2^INT_BITS), which is awful. This is the version for 16 bit ints, expanding to 32 bits should be straightforward.
This structure also supports O(1) GetMin and RemoveMin. (GetMin is instant, Remove has a constant similar to Insert.)
If your data is sparse and well distributed, you could remove the
p4counter, and instead do an insertion sort into the P3 level. That would reduce storage costs by 16, at the cost of a higher worst case insert when there are many similar values.Another idea to improve the storage would be to do combine this idea with something like an Extendable Hash. Use the integer key as the hash value, and keep count of the inserted nodes in the directory. Doing a sum over the relevant dictionary entries on an insert (as above) should still be O(1) with a large constant, but the storage would reduce to O(N)