I have the following question, and it screams at me for a solution with hashing:
Problem :
Given a huge list of numbers, x1........xn where xi <= T, we’d like to know
whether or not exists two indices i,j, where x_i == x_j.
Find an algorithm in O(n) run time, and also with expectancy of O(n), for the problem.
My solution at the moment : We use hashing, where we’ll have a mapping function h(x) using chaining.
First – we build a new array, let’s call it A, where each cell is a linked list – this would be the destination array.
Now – we run on all the n numbers and map each element in x1........xn, to its rightful place, using the hash function. This would take O(n) run time.
After that we’ll run on A, and look for collisions. If we’ll find a cell where length(A[k]) > 1
then we return the xi and xj that were mapped to the value that’s stored in A[k] – total run time here would be O(n) for the worst case , if the mapped value of two numbers (if they indeed exist) in the last cell of A.
The same approach can be ~twice faster (on average), still
O(n)on average – but with better constants.No need to map all the elements into the hash and then go over it – a faster solution could be:
Also note that if
T < n, there must be a dupe within the firstT+1elements, from pigeonhole principle.Also for small
T, you can use a simple array of size T, no hash is needed(hash(x) = x). Initializing T can be done inO(1)to contain zeros as initial values.