I have been trying to come up with a way to write an efficient algorithm to perform an unsorted intersection on two vectors/arrays, but with no luck. I am working with one large non-unique array (generally 500,000 to 1,000,000 values), and one relatively smaller (maybe 5000 values max) unique array.
I have seen a variety of methods suggested on here involving techniques such as unordered_sets, but to my understanding, this doesn’t work if one of the arrays is non-unique. Secondly, instead of having an output vector that contains the numbers common to both arrays, I’d like to have the output vector contain the indices of those common values with respect to the larger array. So, if the larger array has 5 locations that equal one of the values in the smaller array, I need each of those 5 indices. Perhaps something similar to python’s in1d function.
Anyone have any ideas? Thanks
Put the unique side into an
unordered_set, and go through the non-unique side one by one. If you find an item atnon_unique_side[i]in theunordered_set(unique_side), addito the result.Assuming that
unordered_setis implemented as a hash set withO(1)amortized insertion and lookup times, this algorithm gets youO(L+S)time complexity, whereLis the number of items in the larger list, andSis the number of items in the smaller set. This is as fast as you can do an intersection.