Given a list of KeyValuePairs, where each pair has a getValue() method, what would be the fastest way to obtain a List (or Set) of unique Values?
All of the below produce acceptable result. u1 seems to be fastest over an expected list size (about 1000-2000 KVP)
Can we do better (faster)?
private static Set<String> u1(List<_KVPair> pairs) {
Set<String> undefined = new HashSet<String>();
for (_KVPair pair : pairs) {
undefined.add(pair.getValue());
}
if (undefined.size() == 1) {
return new HashSet<String>();
}
return undefined;
}
private static List<String> u2(List<_KVPair> pairs) {
List<String> undefined = new ArrayList<String>();
for (_KVPair pair : pairs) {
if (!undefined.contains(pair.getValue())) {
undefined.add(pair.getValue());
}
}
return undefined;
}
private static List<String> u3(List<_KVPair> pairs) {
List<String> undefined = new LinkedList<String>();
Iterator<_KVPair> it = pairs.iterator();
while (it.hasNext()) {
String value = it.next().getValue();
if (!undefined.contains(value)) {
undefined.add(value);
}
}
return undefined;
}
At about 3600 pairs, ‘u3’ wins. At about 1500 pairs, ‘u1’ wins
First option should be faster. You could possibly make it even faster by sizing the set before using it. Typically, if you expect a small number of duplicates:
Note that I used 1 for the load factor to prevent any resizing.
Out of curiosity I ran a test (code below) – the results are (post compilation):
Test 1 (note: takes a few minutes with warm up)
Test 2
That kind of makes sense:
List#containswill run fairly fast as a duplicate will be found more quickly and the cost of allocating a large set + the hashing algorithm are penalising