I am trying to calculate an initial buffer size to use when decompressing data of an unknown size. I have a bunch of data points from existing compression streams but don’t know the best way to analyze them.
Data points are the compressed size and the ratio to uncompressed size.
For example:
100425 (compressed size) x 1.3413 (compression ratio) = 134,700 (uncompressed size)
The compressed data stream doesn’t store the uncompressed size so the decompressor has to alloc an initial buffer size and realloc if it overflows. I’ll looking for the “best” initial size to alloc the buffer given the compressed size. I have over 293,000 data points.
Given that you have a lot of data points of how your compression works, I’d recommend analyzing your compression data, to get a mean compression standard and a standard deviation. Then, I’d recommend setting your buffer size initially to your original size * your compression size at 2 standard deviations above the mean; this will mean that your buffer is the right size for 93% of your cases. If you want your buffer to not need reallocation for more cases, increase the number of standard deviations above the mean that you’re allocating for.