Given a Range of numbers say 1 to 10,000, Input is in random order.
Constraint:
At any point only 1000 numbers can be loaded to memory.
Assumption:
Assuming unique numbers.
I propose the following efficient , “When-Required-sort Algorithm”.
We write the numbers into files which are designated to hold particular range of numbers. For example, File1 will have 0 – 999 , File2 will have 1000 – 1999 and so on in random order.
If a particular number which is say “2535” is being searched for then we know that the number is in the file3 (Binary search over range to find the file). Then file3 is loaded to memory and sorted using say Quick sort (which is optimized to add insertion sort when the array size is small ) and then we search the number in this sorted array using Binary search. And when search is done we write back the sorted file.
So in long run all the numbers will be sorted.
Please comment on this proposal.
It’s called Bucket sort.
Another approach when main memory is limited is to use Merge sort.
The part of your design where you sort each bucket on demand may be better described as “on demand”, “just-in-time”, or “lazy”. Might as well reuse nomenclature people are already familiar with instead of inventing the term “When-required-sort”.
Have you considered how to handle additional input? What happens if some of the buckets are already sorted, and then more numbers are added?
I assume the end goal is to identify if a number is included in the set, rather than to produce a sorted list. If you do this frequently there is benefit to the initial overhead of sorting a bucket. If infrequently, a linear scan of the appropriate bucket may suffice.
One more alternative. Bucket sort can be thought of as a simplistic hash table. The hash function is
n/1000. Collisions are expected since there can be a large number of values hashed into each bucket (up to 1000). Instead of using on-demand sorting (and then binary search) to resolve collisions, you could use a more sophisticated hash and get O(1) search performance.