From the book Programming Pearls 2nd Edition, quoting the question A from Column 2, section 2.1
Given a sequential file that contains at most four billion integers in
random order, find a 32-bit integer that isn’t in the file (and there
must be at least one missing – why ?). How would you solve this
problem with ample quantities of main memory ? How would you solve it
if you could use several external “scratch” files but only a few
hundred bytes of main memory ?
The problem statement says “at most” four billion integers. So one valid input could be in the range 100 – 299 with one missing number. If this understanding of the problem statement is correct then the required inputs for this problem are the file containing the numbers and also the range of the numbers in the file, ie: i to n.
For this problem isn’t the following O(n) solution more intuitive than the one given in the book (from Ed Reingold) ? Or am I missing something ?
Assume the given range is i…n
using the forumla (n * (n + 1) / 2)
x = the sum of numbers from 1 to i-1
y = the sum of numbers from 1 to n
walk through the input and get a sum of the numbers (value z)
missing number = (y - x - z)
You are missing something:
assumes unique elements and only one missing in the range
much more then 4B elements (to be exact, there are 294967296 more numbers that can be represented by 32 bits then 4B)
Look at the simplified counter example with
range = [1,5], array = [5,5,5,4,1].In this case, you will get
x=1, y = 15, z = 20.However,
20-15-1 = 4, and it is not missing.You can use radix sort, which runs in
O(n)in this case (because 32 bits is constant), and then scan the sorted array to find the first missing element.EDIT: A more efficient way to do it is yet with a variation of radix sort and selection algorithm:
k. Look at the first bit, and partition the array into two parts, the first with this bit unset, and the second with this bit set.(2^k) / 2 = 2^(k-1)elements.k' = k-1(reduce the current number of bits) and return to 1.Keep doing it until you exhausted your bits, and you will get a number that is not in the original list.
Note that assuming the list is random enough, the complexity of the algorithm is
O(n)– for any number of bits (and notO(n*k))