We have been given an array of size N that contains integers in the range 0 to N-2, both inclusive.
The array can have multiple repeated entries. We need to find one of the duplicated entries in O(N) time and constant space.
I was thinking of taking the product and sum of all the entires in the array, and the product and sum of all the numbers in the range 0 to N-2.
Then, the difference of the sums and the division of the products would give us two equations. This approach would work if it were given that there are only two repeated entries, but since there can be more than two, I think my approach fails.
Any suggestions?
Edit: The array is immutable. I realize that this is an important piece of information and I apologize that I forgot to include this earlier.
Here’s a nice treatment. It passes through some easier problems before addressing this one.
http://aperiodic.net/phil/archives/Geekery/find-duplicate-elements.html
It contains a solution for when you can modify the input array, and another for when you can’t.
Brief summary in case the link ever dies: the array indexes run from 0 .. N-1, and the array values run 0 .. N-2. Each array element can therefore be considered as an index (or “pointer”) into the array itself: element
i“points to” elementra[i],ra[i]points tora[ra[i]]and so on. By repeatedly following these pointers, me must eventually enter a cycle, because we certainly can’t go forever without revisiting some node or other.Now, the very last element, N-1, isn’t pointed to by any other element. So if we start there and eventually enter a cycle, somewhere along the way there must be an element which can be reached from two different places: the route we took the first time, and the route which is part of the cycle. Something like this:
In this case, a2 is reachable from two different places.
But a node which is reachable from two different places is precisely what we’re looking for, a duplicate in the array (two different array elements containing the same value).
The question then is how to identify a2, and the answer is to use Floyd’s cycle-finding algorithm. In particular it tells us the “start” of the loop in O(N) time and O(1) space.