In my application, I need to store and transmit data that contains many repeating

Question

0

Asked: May 12, 20262026-05-12T15:29:49+00:00 2026-05-12T15:29:49+00:00

In my application, I need to store and transmit data that contains many repeating

0

In my application, I need to store and transmit data that contains many repeating string values (think entity names in an XML document). I have two proposed solutions:

A) create a string table to be stored along the document, and then use index references (using multi-byte encoding) in the document body, or
B) simply compress the document using gzip or a similar compression algorithm.

Which one is likely going to perform better in terms of speed and data size? (Obviously, this depends on the quality of the implementations, but assume that option A builds an array of strings dynamically and encodes the document body in some reasonable fashion).

Also, if option B, do you recommend a more potentially suitable compression method other than gzip?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-12T15:29:49+00:00

gzip is only a good algorithm when the transmission/storage cost is not too high compared to the cost of CPU time. You can get better compression ratios with bzip2, 7zip, and especialy for natural language, various PPM algorithms.

Of course, it’s not only computation (and static vs. dynamic memory requirement) vs. compression ratio that matters – different compression formats allow varying degrees of efficient random access seeking, low latency stream decoding, and concatenation of zipped data (e.g. cat a.gz b.gz | gunzip -c is the same as gunzip -c a.gz;gunzip -c b.gz

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

In my application, I need to store and transmit data that contains many repeating

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply