I have quite large List named items (>= 1,000,000 items) and some condition denoted by <cond> that selects items to be deleted and <cond> is true for many (maybe half) of items on my list.
My goal is to efficiently remove items selected by <cond> and retain all other items, source list may be modified, new list may be created – best way to do it should be chosen considering performance.
Here is my testing code:
System.out.println("preparing items");
List<Integer> items = new ArrayList<Integer>(); // Integer is for demo
for (int i = 0; i < 1000000; i++) {
items.add(i * 3); // just for demo
}
System.out.println("deleting items");
long startMillis = System.currentTimeMillis();
items = removeMany(items);
long endMillis = System.currentTimeMillis();
System.out.println("after remove: items.size=" + items.size() +
" and it took " + (endMillis - startMillis) + " milli(s)");
and naive implementation:
public static <T> List<T> removeMany(List<T> items) {
int i = 0;
Iterator<T> iter = items.iterator();
while (iter.hasNext()) {
T item = iter.next();
// <cond> goes here
if (/*<cond>: */i % 2 == 0) {
iter.remove();
}
i++;
}
return items;
}
As you can see I used item index modulo 2 == 0 as remove condition (<cond>) – just for demonstation purposes.
What better version of removeMany may be provided and why this better version is actually better?
OK, it’s time for test results of proposed approaches. Here what approaches I have tested (name of each approach is also class name in my sources):
NaiveRemoveManyPerformer–ArrayListwith iterator and remove – first and naive implementation given in my question.BetterNaiveRemoveManyPerformer–ArrayListwith backward iteration and removal from end to front.LinkedRemoveManyPerformer– naive iterator and remove but working onLinkedList. Disadventage: works only forLinkedList.CreateNewRemoveManyPerformer–ArrayListis made as a copy (only retained elements are added), iterator is used to traverse inputArrayList.SmartCreateNewRemoveManyPerformer– betterCreateNewRemoveManyPerformer– initial size (capacity) of resultArrayListis set to final list size. Disadvantage: you must know final size of list when starting.FasterSmartCreateNewRemoveManyPerformer– even better (?)SmartCreateNewRemoveManyPerformer– use item index (items.get(idx)) instead of iterator.MagicRemoveManyPerformer– works in place (no list copy) forArrayListand compacts holes (removed items) from beginning with items from end of the list. Disadventage: this approach changes order of items in list.ForwardInPlaceRemoveManyPerformer– works in place forArrayList– move retaining items to fill holes, finally subList is returned (no final removal or clear).GuavaArrayListRemoveManyPerformer– Google GuavaIterables.removeIfused forArrayList– almost the same asForwardInPlaceRemoveManyPerformerbut does final removal of items at the end of list.Full source code is given at the end of this answer.
Tests where performed with different list sizes (from 10,000 items to 10,000,000 items) and different remove factors (specifying how many items must be removed from list).
As I posted here in comments for other answers – I have thought that copying items from
ArrayListto secondArrayListwill be faster than iteratingLinkedListand just removing items. Sun’s Java Documentation says that constant factor ofArrayListis low compared to that for theLinkedListimplementation, but surprisingly this is not the case in my problem.In practice
LinkedListwith simple iteration and removal has best performance in most cases (this approach is implemented inLinkedRemoveManyPerformer). Usually onlyMagicRemoveManyPerformerperformance is comparable toLinkedRemoveManyPerformer, other approaches are significantly slower. Google GuavaGuavaArrayListRemoveManyPerformeris slower than hand made similar code (because my code does not remove unnecessary items at end of list).Example results for removing 500,000 items from 1,000,000 source items:
NaiveRemoveManyPerformer: test not performed – I’m not that patient, but it performs worse thanBetterNaiveRemoveManyPerformer.BetterNaiveRemoveManyPerformer: 226080 milli(s)LinkedRemoveManyPerformer: 69 milli(s)CreateNewRemoveManyPerformer: 246 milli(s)SmartCreateNewRemoveManyPerformer: 112 milli(s)FasterSmartCreateNewRemoveManyPerformer: 202 milli(s)MagicRemoveManyPerformer: 74 milli(s)ForwardInPlaceRemoveManyPerformer: 69 milli(s)GuavaArrayListRemoveManyPerformer: 118 milli(s)Example results for removing 1 item from 1,000,000 source items (first item is removed):
Example results for removing 333,334 items from 1,000,000 source items:
Example results for removing 1,000,000 (all) items from 1,000,000 source items (all items are removed but with one-by-one processing, if you know a priori that all items are to be removed, list should be simply cleared):
My final conclusions: use hybrid approach – if dealing with LinkedList – simple iteration and removal is best, if dealing with ArrayList – it depends if item order is important – use ForwardInPlaceRemoveManyPerformer then, if item order may be changed – best choice is MagicRemoveManyPerformer. If remove factor is known a priori (you know how many items will be removed vs retained) then some more conditionals may be put to select approach performing even better in particular situation. But known remove factor is not usual case… Google Guava
Iterables.removeIfis such a hybrid solution but with slightly different assumption (original list must be changed, new one cannot be created and item order always matters) – these are most common assumptions soremoveIfis best choice in most real-life cases.Notice also that all good approaches (naive is not good!) are good enough – any one of them shold do just fine in real application, but naive approach must be avoided.
At last – my source code for testing.