I have a set of double-precision data and I need their list to be always sorted. What is the best algorithm to sort the data as it is being added?
As best I mean least Big-O in data count, Small-O in data count (worst case scenario), and least Small-O in the space needed, in that order if possible.
The set size is really variable, from a small number (30) to lots of data (+10M).
Building a self-balancing binary tree like a red-black tree or AVL tree will allow for Θ(lg n) insertion and removal, and Θ(n) retrieval of all elements in sorted order (by doing a depth-first traversal), with Θ(n) memory usage. The implementation is somewhat complex, but they’re efficient, and most languages will have library implementations, so they’re a good first choice in most cases.
Additionally, retreiving the i-th element can be done by annotating each edge (or, equivalently, node) in the tree with the total number of nodes below it. Then one can find the i-th element in Θ(lg n) time and Θ(1) space with something like:
An implementation that supports this can be found in debian’s libavl; unfortunately, the maintainer’s site seems down, but it can be retrieved from debian’s servers.