I’ll try and describe my problem as best as I can, but please ask if there are things that make no sense.
- I have a finite number of lists
- Each list contains a finite number of contacts
- Each contact is represented as a HashMap
- Each list is linked to a provider
- The same contact may be present in multiple providers (and hence multiple lists).
- I need a ‘master’ list that contains all the unique entries in the other lists
I’m looking for an efficient way to merge these lists into a master list without duplicates. For example if the same contact appears in multiple lists (multiple HashMaps corresponding to the same physical person) I want to merge all the HashMaps into a single one, and put the merged HashMap into the master list. A simple ‘putall’ here won’t do since I need to re-key the contents to efficiently access them (eg. provider one gives me a list of email addresses keyed as ’emails’ and provider 2 gives me the same info keyed as ’emailList’).
Merging the individual HashMaps is the easier of two problems since I know these keys and can easily merge them.
The problem that has me scratching my head is efficient scanning of the lists … is there no other way than linearly going through each list in a nested loop, grabbing the next HashMap, checking if it already exists in the mater list and merging/creating a new one … ?
First observation – using a HashMap to represent your contacts smells of “object denial”.
You need to design and implement a Contact class to represent a contact. Without this class, your task is a whole bunch harder than it needs to be.
The class needs getters for all of the contact key fields, and it needs to implement equals, hashcode and Comparable based on the key fields. Getters (and optionally setters) are also needed for non-key fields.
With that, the merging process becomes (pseudo-code):
The performance characteristics of the various phases should be:
O(N).O(N)O(NlogN)O(M + N).The overall performance should be better than
O(NlogN)where N is the total number of master and merge Customer objects.