What is the fastest method of finding duplicates across multiple (large) linked lists.
I will attempt to illustrate the problem with arrays instead just to make it a bit more readable. (I used numbers from 0-9 for simplicity instead of pointers).
list1[] = {1,2,3,4,5,6,7,8,9,0};
list2[] = {0,2,3,4,5,6,7,8,9,1};
list3[] = {4,5,6,7,8,9,0,1,2,3};
list4[] = {8,2,5};
list5[] = {1,1,2,2,3,3,4,4,5,5};
If I now ask: ‘does the number 8 exist in list1-5?’ I could sort the lists, remove duplicates, repeat this for all lists and merge them into a “superlist” and see if the number of (new) duplicates equal the number of lists that I search through. Assuming that I got the correct number of duplicates I can assume that what I searched for (8) exists in all of the lists.
If I instead searched for 1 I will only get four duplicates—ergo not found in all of the lists.
Is there a faster/smarter/better way to achieve the above without sorting and/or changing the lists in any way?
P.S.: This question is asked mostly out of pure curiosity and nothing else! 🙂
Define an array
hashand set all the location values to 0Now for each element in your
list, use this number as an index inhashand increment that location ofhash. Each presence of that number would increment the value at thathashlocation once. So a duplicate valueiwould havehash[i] > 1If you want to remove the duplicates and create a new list then scan the
hasharray and for each presence ofiie. ifhash[i] > 0load them into a new list in the order in which they appeared in the original list.Note that when using with negative numbers you will not be able to use the values directly to index. To use negative numbers, first we can find the largest magnitude of the negative numbers and use that magnitude to add to all the numbers when we use them to index the
hasharray.Or in implementation you can allocate contiguous memory and then define a pointer at the middle of the allocated memory block, so that you could move in both front and back directions so that you can use negative index with it. You need to make sure that you have enough memory to use in front and back of the pointer.
now you can do
hash_ptr[-6]or somehash_ptr[i]with-SYMBOLS/2 < i < SUMBOLS/2 + 1