We are doing some empirical testing of Java data structures and have got some results that we can’t explain properly.
For example when we are testing TreeSet’s last-method which time requirement should be constant we get a bump in running time after the size of TreeSet exceeds 30 000. We run the last-method with increasing number of elements in treeSet 1000 times for every size and then taken the median value of the results.
The relevant code is:
import java.io.IOException;
import java.lang.management.ManagementFactory;
import java.lang.management.ThreadMXBean;
import java.util.ArrayList;
import java.util.Collections;
import jxl.write.WriteException;
public class TestRunner {
public void test(Testable testeCase, String outputFileName, Integer... initArgs){
ExcelWriter excel = null;
try {
excel = new ExcelWriter(outputFileName);
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
ThreadMXBean threadMxBean = ManagementFactory.getThreadMXBean();
int measurementsPoints = 35;
//calculate median to every dataset size
for(int j = 0; j < measurementsPoints; j++){
int testCount = 1000;
long startTime;
long endTime;
//double sum = 0;
ArrayList<Integer> results = new ArrayList<Integer>();
for (int i = 0; i < testCount; i++) {
//initialize tested data structure
testeCase.initTestRun(initArgs);
startTime = threadMxBean.getCurrentThreadCpuTime();
// run tested method
testeCase.runTestMethod();
endTime = threadMxBean.getCurrentThreadCpuTime();
results.add((int)(endTime - startTime));
}
Collections.sort(results);
excel.addNumber(j, 5, new Double(initArgs[0]));
excel.addNumber(j, 6, new Double(results.get(testCount / 2)));
//increase the size of the data structure 10, 15, 20, 30, 40, 60, 80...
if(j % 2 == 0){
initArgs[0] = (int)(initArgs[0] * 1.5);
}
else{
initArgs[0] = (int)(initArgs[0] / 3 * 4);
}
}
try {
excel.write();
} catch (WriteException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
import java.util.TreeSet;
public class TreeSetLastTest implements Testable {
private TreeSet<Integer> values;
@Override
public void initTestRun(Integer... integers) {
Integer initialCapacity = integers[0];
values = new TreeSet<Integer>();
for(int i = Integer.MIN_VALUE; i < Integer.MIN_VALUE + initialCapacity; i++){
values.add(i);
}
}
@Override
public void runTestMethod() {
values.last();
}
}
When the number of elements in the treeSet is between 10-30 000 elements the measured median is 3000 ns. When the size of the treeSet increases to 120 000 elements increases the measured median to 13 000 ns and then stays there when the number of elements increase over one million. So what could be the reason that causes the increase or is the time unit just so small that the differences are meaningless in reality. Thanks for the help.
Well I assume it’s worth an answer.
Your assumption that a
TreeSethas O(1)last()is erroneous. First the documentation doesn’t state anything of this kind and in fact aTreeSetin java is implemented using aTreeMapwhich is an implementation of a red-black tree.A red-black tree is similar to an AVL-tree which may be better known in that it guarantees
O(log n)for lookups, i.e. makes sure the tree doesn’t degenerate into a linked list. Basically yourlast()lookup hasO(log n)complexity so will get worse as it gets larger.Presumably because of caching, maybe even paging effects you don’t see the
O(log n)in your benchmark directly.That’s similar to LinkedLists and arrays – in theory linked lists have lots of things going for them, in practice linked lists are one of the worst data structures you can use on modern CPUs. Constant factors do matter after all and memory access patterns are large constant factors.