One of the core steps in file compression like ZIP is to use the

Question

0

Asked: May 11, 20262026-05-11T09:15:58+00:00 2026-05-11T09:15:58+00:00

One of the core steps in file compression like ZIP is to use the

0

One of the core steps in file compression like ZIP is to use the previous decoded text as a reference source. For example, the encoded stream might say ‘the next 219 output characters are the same as the characters from the decoded stream 5161 bytes ago.’ This lets you represent 219 characters with just 3 bytes or so. (There’s more to ZIP than that, like Huffman compression, but I’m just talking about the reference matching.)

My question is what the strategy(ies) for the string matching algorithm is. Even looking at source code from zlib and such don’t seem to give a good description of the compression matching algorithm.

The problem might be stated as: Given a block of text, say 30K of it, and an input string, find the longest reference in the 30K of text which exactly matches the front of the input string.’ The algorithm must be efficient when iterated, ie, the 30K block of text will be updated by deleting some bytes from the front and adding new ones to the rear and a new match performed.

I’m a lot more interested in discussions of the algorithm(s) to do this, not source code or libraries. (zlib has very good source!) I suspect there may be several approaches with different tradeoffs.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

score 0 · Answer 1 · 2026-05-11T09:15:59+00:00

2026-05-11T09:15:59+00:00Added an answer on May 11, 2026 at 9:15 am

You could look at the details of the LZMA Algorithm used by 7-zip. The 7-zip author claims to have improved on the algorithm used by zlib et al.

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

One of the core steps in file compression like ZIP is to use the

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply