Very often I have to use objects from the java.util.collection package, objects that conform to the the Map and Set interfaces.
When I insert several million tuples or entities into these objects (HashMap, TreeMap, etc) their performance, both insertion and look-up slow to a crawl.
I have devised, derived classes which are essentially compositions of the classes in java.util.collection that scale better in performance.
I was wondering if there is an open source equivalent of the java.util.collections package that is optimized for handling large amounts of data.
For better performing collections libraries, try trove. But, in general, you want to tackle these kinds of issues by streaming, or another form of lazy loading, such that you can do things like aggregation without loading the entire dataset into memory.
You could also use a key value store like Redis or CouchDB for storing this data.