Consider a function minout :: [Int] -> Int which takes a list of distinct nonnegative integers, and returns the smallest nonnegative integer not present in the list. The behaviour of the function if the input has duplicates does not matter. Can this be implemented in linear time, using only lists (no arrays or vectors or other data structures with efficient random access)?
(This came up here.)
If
lhas all numbers between0and(length l) - 1inclusive, thenminout lislength l, otherwise, it lies in[0..(length l - 1)]. Sominout lalways lies in[0..(length l)], and only the elements oflwhich are in[0..(length l - 1)]are relevant. We can discard the remaining elements. Using this idea we can implement a linear-time divide-and-conquer solution. Unlike in merge sort, at each step of the recursion, we recurse into only one of two sublists each of which is at most half the size of the original (after doing some linear work). This gives a linear time complexity.In the above code,
minoutauxis a function which given a “base” integer and a list with distinct entries, returns the smallest integer which is at least base and does not occur in the list. To do this, it discards the “irrelevant” elements which can be discarded, as explained earlier, and generates two lists, consisting of those numbers which lie in [base,base + n2) (calledsmallpart), and [base + n2,base + n) (calledbigpart). Each of these lists will have length at mostn2. Iflength smallpart == n2, thensmallparthas all numbers in [base,base + n2), and so the answer must lie inbigpart, otherwise, there is a “gap” insmallpartitself, so the answer lies insmallpart.Why does this run in linear time? First, the whole list of length N is traversed a few times, which needs some 10N operations, let’s say. Then
minoutauxis called on a smaller list, of size at most N/2. So we have (at most) 10N/2 more operations. Then 10N/4, 10N/8, and so on. Adding all these, we get a bound of 10(2N) = 20N. (the constant 10 was just used as an example)Here we are traversing the list multiple times to compute its length, compute
smallpart, computebigpart, and so on. One could optimize that fairly easily by doing all this in a single pass. However this is still a linear time solution, and I wanted to make the code clearer, rather than optimize on constant factors.This question and solution is not my original; I came across it in class when I learnt Haskell.