I am building a cache that has to store as much data as possible. CPU is not a mayor issue, because the next level of data is a lot more expessive to reach than running the CPUs a little bit for decompression.
I’m looking for a good strategy and not a full implemenation. A typical object instance that should be cached can be gernalized as a list of hashmaps. The keys in these map are very similiar to keys in another map in that list. Keys and values are strings.
Maps in different caching objects (this means also different lists) may not always have similar keys. Maybe only a subset (50%) of the keys is the same.
I was thinking of extracting the keys into ONE header array and each collection of values of the hashmap into another array with the same length. This means the data array might be sparse (null-pointers). But I don’t have to carry the meta data around. The possition in the data array is the only way of looking up the correct key.
Now I want to compress the data array. Compression won’t really work well on a single data array because there is little information. It will need a few data arrays stuck together to get a good compression rate.
Is there any good way of compressing String-Arrays in java? How many of these data arrays should I cluster for good results?
Is there maybe some better aporoach? This is a open questions for collecting ideas, so please feel free to elaborate 🙂
Flyweight can help
If are not compressing you can use Flyweight pattern to avoid the cost of the string-key repeated in each object.
Remember a string is an object so a key in your hashmap is a reference to it. If a lot of objects with the same property use references to the same string object you only have 4-bytes for each reference and only one string in memory.
How to ensure you are sharing the string objects between objects? You can use something similar to
String.intern(). But please don’t use String.intern() itself.Interning a string is returning the same string-object for the same string value. You must hold a cache for those strings. The reason I don’t recommend String.intern() is that the cache is the String class itself so it nevers get freed. But you can implement something analogous.
This code returns your own string if it’s new. And returns the first one if it’s not.
But if you are compressing, not
Because compressing means you are serializing your object graphs and each property name will be serialized as a different string, so repeating itself. Maybe the compressed size doesn’t grow too much because it’s a repeated string but when you rehidrate the objects they will be created separately.
Maybe you can use the
returnUniqueStringat the time of rehidrating 🙂