I have an application that manages a large number of strings. Strings are in a path-like format and have many common parts, but without a clear rule. They are not paths on the file-system but can be considered like so.
I clearly need to optimize memory consumption but without a big performance sacrifice.
I am considering 2 options:
– implement a compressed_string class that stores data zipped, but i need a fixed dictionary and i cant find a library for this right now. I don’t want a Huffman on bytes, I want it on words.
– implement some kind of flyweight pattern on string parts.
The problem looks like a common one and I’m wonder what is the best solution to it or if someone knows a library that targets this issue.
thanks
In the sense that they are locators in a hierarchy of the form name, (separator, name)*? If so, you can use interning: store the name parts as
char const *elements that point into a pool of strings. That way, you effectively compress a name that is used n times to just overn * sizeof(char const *) + strlen(name)bytes. The full path would become a sequence of interned names, e.g. anstd::vector.It might seem that
sizeof(char const *)is big on 64-bit hardware, but you also save some of the allocation overhead. Or, if you know for some reason that you’ll never need more than, say, 65536 strings, you might store them aswhere
NAME_TABLEis anstatic std::unordered_map<uint16_t, char const *>.