Assume that the array has integers between 1 and 1,000,000.
I know some popular ways of solving this problem:
- If all numbers between 1 and 1,000,000 are included, find the sum of the array elements and subtract it from the total sum (n*n+1/2)
- Use a hash map (needs extra memory)
- Use a bit map (less memory overhead)
I recently came across another solution and I need some help in understanding the logic behind it:
Keep a single radix accumulator. You exclusive-or the accumulator with
both the index and the value at that index.The fact that x ^ C ^ x == C is useful here, since each number will be
xor’d twice, except the one that’s in there twice, which will appear 3
times. (x ^ x ^ x == x) And the final index, which will appear once.
So if we seed the accumulator with the final index, the accumulator’s
final value will be the number that is in the list twice.
I will appreciate it if some one can help me understand the logic behind this approach (with a small example!).
Assume you have an accumulator
At each step of your loop, you XOR the accumulator with
iandv, whereiis the index of the loop iteration andvis the value in theith position of the array.Normally,
iandvwill be the same number so you will end up doingBut
i ^ i == 0, so this will end up being a no-op and the value of the accumulator will be left untouched. At this point I should say that the order of the numbers in the array does not matter because XOR is commutative, so even if the array is shuffled to begin with the result at the end should still be0(the initial value of the accumulator).Now what if a number occurs twice in the array? Obviously, this number will appear three times in the XORing (one for the index equal to the number, one for the normal appearance of the number, and one for the extra appearance). Furthermore, one of the other numbers will only appear once (only for its index).
This solution now proceeds to assume that the number that only appears once is equal to the last index of the array, or in other words: that the range of numbers in the array is contiguous and starting from the first index to be processed (edit: thanks to caf for this heads-up comment, this is what I had in mind really but I totally messed it up when writing). With this (
Nappears only once) as a given, consider that starting witheffectively makes
Nagain appear twice in the XORing. At this point, we are left with numbers that only appear exactly twice, and just the one number that appears three times. Since the twice-appearing numbers will XOR out to 0, the final value of the accumulator will be equal to the number that appears three times (i.e. one extra).