Please read this question carefully before closing it as a duplicate, though if it is an honest duplicate I’ll be happy to know about it. This is a generalization of Find any one of multiple possible repeated integers in a list.
Given any set S of N distinct
integers, and any array A of
length N+1 each entry of which is
taken from S, what is the best
algorithm to find some (there must be
at least one) repeated entry of A?
NOTE: There could be multiple repeated entries in A, and any entry could be repeated multiple times.
As Nemo points out, the trivial solution takes O(1) space and O(N^2) time. I’m looking for a solution that improves the time without compromising the space too much. To be precise, the solution(s) I’m looking for:
- Returns a value a that appears in A at least twice,
- Uses at most O(log N) space without modifying A, and
- Takes less than O(N^2) time
EDIT: The set S is there to ensure that the array A has at least one repeated entry. For this problem, do not assume that you have S given to you as an ordered set. You can query S (boolean to return true is s in S and false otherwise) and you can query A (call for A[i]), but that’s all. Any solution that sorts A or S exceeds the space limit.
This generalization invalidates my pointer solution to the original question (which has O(1) space and O(N) time), and the space constraint I’m imposing invalidates fiver’s solution (which has O(N) space and time).
This algorithm is similar to Justin Simon’s, but the key point is how to compute the median (or the kth element) of S using just O(1) space efficiently.
Here is that key algorithm, which is randomized:
Set lower equal to the minimum element of S and upper equal to the maximum element of S. Pick a random element x from S that is between lower and upper (this costs at most O(n) expected time). Compute the rank of x (O(n) time). If x’s rank is too low, set lower to the successor of x (O(n) time), else set upper equal to the predecessor of x (O(n) time). Repeat until lower equals upper.
Note that each iteration costs O(n) in expectation and there are O(lg n) iterations in expectation so the expected time cost is O(n lg n) and space usage is O(1) since we only store lower and upper.
Using this ability to select the kth element, we can then use the pigeonhole principle as suggested in the original question to find increasingly small segments of S that contain too many elements to all be distinct, using O(lg n) linear scans of A and O(1) space to store the relevant sums of elements in each region. Each such iteration costs O(n) in addition to the O(n lg n) cost of finding the kth element, and there are O(lg n) iterations, so the total cost is O(n lg^2 n).