I have a program which fetches records from database (using Hibernate) and fills them in a Vector. There was an issue regarding the performance of the operation and I did a test with the Vector replaced by a HashSet. With 300000 records, the speed gain is immense – 45 mins to 2 mins!
So my question is, what is causing this huge difference? Is it just the point that all methods in Vector are synchronized or the point that internally Vector uses an array whereas HashSet does not? Or something else?
The code is running in a single thread.
EDIT:
The code is only inserting the values in the Vector (and in the other case, HashSet).
If it’s trying to use the
Vectoras a set, and checking for the existence of a record before adding it, then filling the vector becomes an O(n^2) operation, compared with O(n) forHashSet. It would also become an O(n^2) operation if you insert each element at the start of the vector instead of at the end.If you’re just using
collection.add(item)then I wouldn’t expect to see that sort of difference – synchronization isn’t that slow.If you can try to test it with different numbers of records, you could see how each version grows as n increases – that would make it easier to work out what’s going on.
EDIT: If you’re just using
Vector.addthen it sounds like something else could be going on – e.g. your database was behaving differently between your different test runs. Here’s a little test application:Output:
Now obviously this isn’t going to be very accurate –
System.currentTimeMillisisn’t the best way of getting accurate timing – but it’s clearly not taking 45 minutes. In other words, you should look elsewhere for the problem, if you really are just callingVector.add(item).Now, changing the code above to use
makes an enormous difference – it takes 42 seconds instead of 38ms. That’s clearly a lot worse – but it’s still a long way from being 45 minutes – and I doubt that my desktop is 60 times as fast as yours.