I have a text file that I want to send over the network, this file could vary in size from as low as 1KB to 500KB.
What algorithms/techniques could I use to tightly compress this file before sending it such that the least amount of bytes are send over the network and compression ratio is high?
I have a text file that I want to send over the network, this
Share
For compression, I’d consider gzip, bzip2 and LZMA (this is not an exhaustive list but these are IMO the most famous).
Then, I’d look for some benchmarks on the net and try to gather metrics for various files type (text, binary, mixed) and size (small, big, huge). Even if you’re mostly interested by compression ratio, you might want to look at: the compression ratio, the compression time, the memory footprint, the decompression time.
According to A Quick Benchmark: Gzip vs. Bzip2 vs. LZMA:
This is confirmed in LZMA – better than bzip2:
So, for the compression of text files, the same site reports:
Finally, here is another resource with graphical results: Compression Tools: lzma, bzip2 & gzip
I’d really recommend to perform your own bench (as you’ll be compressing text only and very small to small files) to get real metrics in your environment, but my bet is that
LZMAwon’t provide a significant advantage on small text files sobzip2would be a decent choice (even if the time and memory overhead ofLZMAmight be low on small files).If you plan to perform the compression from Java, you’ll find a
LZMAimplementation here, a bzip2 implementation here (coming from Apache Ant AFAIK),gzipbeing included in the JDK. If you don’t want to or can’t rely on a third party library, use gzip.