I have a C# program that uses a production grammar to generate 3D models of trees and flowers and similar organic entities (see wikipedia entry for more info on L-Systems) – when I’m generating a large tree with leaves, I (expectedly) get exponential growth in the string that would go up to 100’s of gigs if I’d let it (and I’d like to).
Constraints – I have to do this (sort of) in C# – the C++/native side is busy compiling and rendering the rather immense geometry that’s produced.
So StringBuilder is right out — even if it could handle it, I don’t have enough memory!
I don’t want to do a pure file based solution – waaaaaayyyyyyyy toooooooooooo sloooooooooooowwww!
I can’t change the grammar – I realize I could compress the standard L-Systems notation, but it’s a context sensitive grammar, so once you’ve got it working, you become positively superstitious about fiddling with it.
Things I’ve considered
Memory mapped files – I don’t mind using P/Invoke to get to the native layer to support things, I just don’t want to rewrite the whole production system in C++ – but I haven’t found much in the way of handy libraries for C# to access this functionality
Low level mucking about with memory management/page faulting, etc – but hey, if I did that I might as well sell it as a product – makes the slow pure file based solution not look like such a bad idea
Anybody got any ideas here ? How do I effeciently traverse/manipulate/expand multigig strings produced by a production grammar ?
You’re quite right that the typical approach to compression involves the notion of a pre-existing plaintext. What I’m talking about here is something like the idea of using a trie data structure as opposed to a dictionary. It’s not just about passively compressing, but rather using an inherently more compact representation that encodes the redundancies implicitly. If you’re at the 100G mark today, you’re within an order of magnitude of bursting past the limits of affordable hard drives, so you might benefit from rethinking the solution.