I have a list of objects say, List. The Entity class has an equals method,on few attributes ( business rule ) to differentiate one Entity object from the other.
The task that we usually carry out on this list is to remove all the duplicates something like this :
List<Entity> noDuplicates = new ArrayList<Entity>();
for(Entity entity: lstEntities)
{
int indexOf = noDuplicates.indexOf(entity);
if(indexOf >= 0 )
{
noDuplicates.get(indexOf).merge(entity);
}
else
{
noDuplicates.add(entity);
}
}
Now, the problem that I have been observing is that this part of the code, is slowing down considerably as soon as the list has objects more than 10000.I understand arraylist is doing a o(N) search.
Is there a faster alternative, using HashMap is not an option, because the entity’s uniqueness is built upon 4 of its attributes together, it would be tedious to put in the key itself into the map ? will sorted set help in faster querying ?
Thanks
The algorithm you posted is actually worse than O(N)
lstEntities– O(N)ArrayList.indexOf(T)which has to scan the list – O(N) againYou algorithm is actually O(N^2) since you are potentially scanning the list twice within a loop.
It sounds like you what you want to do is actually two operations:
List, remove any duplicatesYou can do this by scanning the list just once, rather than in nested loops. I would recommend breaking up your
Entityto move the fields that “identify” an Entity into another type, such asID, or at the very least add agetID()method which can return these fields grouped into a single type. This way you can easily build a Map between the two types to be able to merge entities with “duplicate” identities. This might look something like this:Iterating through the list is O(n) while
HashMap.get(K)is a constant-time operation.