Java allows the calculation of the (set theoretic) difference and the intersection of two Collection objects, via the removeAll() and retainAll() methods of the Collection interface.
The implementation of these 2 methods in the AbstractCollection class of Java 6 is
public boolean removeAll(Collection<?> c) { // Difference
boolean modified = false;
Iterator<?> e = iterator();
while (e.hasNext()) {
if (c.contains(e.next())) {
e.remove();
modified = true;
}
}
return modified;
}
public boolean retainAll(Collection<?> c) { // Intersection
boolean modified = false;
Iterator<E> e = iterator();
while (e.hasNext()) {
if (!c.contains(e.next())) {
e.remove();
modified = true;
}
}
return modified;
}
Is there any way of implementing or executing the above (obviously expensive) operations faster?
For example, would there be any overall performance gain from sorting a Collection before calculating the differences or the intersection?
Is there any class of the Collections framework preferable (performance-wise) for using these operations?
Yes, there is a faster method possible. The code you supplied loops through c for every element of e. With two arrays of 100 elements, it would compare approximately 100,000 elements.
If you sort both arrays first, you only have to keep comparing the top two elements. This would do a couple hundred comparisons. This would be similar to merge sort. Do to do an intersection of the sorted collections
leftandright: