I have a table with around 20 columns with mostly consisting of varchars and decimals. This table has almost 1.5M rows. But few things are common in them like column1 consists of only 100 distinct strings , column2 has almost 1000 and column3 has almost 500.
Right now, I am storing all these column values in a map with Key as first 5 columns and Data as rest of columns. My task is such, I need to initialize all these at the start of the task.
What pattern(like Flyweight, etc) or data structure should I use to minimize my Object storage?
Why I need pre-load of all data?
Assume the whole data of the table as a tree and the victims can be at any leaf, trunk or at root. So for each entry[this is coming from different place], I need to see if there is any match in the tree.
Internalizing is not the best option. Garbage collecting from the PermSpace is possible but nothing the VM is optimized for.
You can implement your own CharSequence implementation that is backed by shared char[] arrays.
With a CharSequence implementation you’ll be able to implement basic sharing semantics like internalized strings or more complicated ones taking substrings and other projections into account.
A custom CharSequence implementation can also be optimized to perform fewer memory allocations than the String class which is copying char[] around (for safety reasons that are not necessary if you have the backing char[] under your full control). Even
new String("..").intern()will intantiate a new String instance (char[] array) that is rapidly garbage collected.