Backstory (skip to second-to-last paragraph for data structure part): I’m working on a compression

Question

0

Asked: May 26, 20262026-05-26T11:20:05+00:00 2026-05-26T11:20:05+00:00

Backstory (skip to second-to-last paragraph for data structure part): I’m working on a compression

0

Backstory (skip to second-to-last paragraph for data structure part): I’m working on a compression algorithm (of the LZ77 variety). The algorithm boils down to finding the longest match between a given string and all strings that have already been seen.

To do this quickly, I’ve used a hash table (with separate chaining) as recommended in the DEFLATE spec: I insert every string seen so far one at a time (one per input byte) with m slots in the chain for each hash code. Insertions are fast (constant-time with no conditional logic), but searches are slow because I have to look at O(m) strings to find the longest match. Because I do hundreds of thousands of insertions and tens of thousands of lookups in a typical example, I need a highly efficient data structure if I want my algorithm to run quickly (currently it’s too slow for m > 4; I’d like an m closer to 128).

I’ve implemented a special case where m is 1, which runs very fast buts offers only so-so compression. Now I’m working on an algorithm for those who’d prefer improved compression ratio over speed, where the larger m is, the better the compression gets (to a point, obviously). Unfortunately, my attempts so far are too slow for the modest gains in compression ratio as m increases.

So, I’m looking for a data structure that allows very fast insertion (since I do more insertions than searches), but still fairly fast searches (better than O(m)). Does an O(1) insertion and O(log m) search data structure exist? Failing that, what would be the best data structure to use? I’m willing to sacrifice memory for speed. I should add that on my target platform, jumps (ifs, loops, and function calls) are very slow, as are heap allocations (I have to implement everything myself using a raw byte array in order to get acceptable performance).

So far, I’ve thought of storing the m strings in sorted order, which would allow O(log m) searches using a binary search, but then the insertions also become O(log m).

Thanks!

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-26T11:20:06+00:00

Editorial Team

2026-05-26T11:20:06+00:00Added an answer on May 26, 2026 at 11:20 am

You might be interested in this match-finding structure :

http://encode.ru/threads/1393-A-proposed-new-fast-match-searching-structure

It’s O(1) insertion time and O(m) lookup. But (m) is many times lower than a standard Hash Table for an equivalent match finding result. As en example, with m=4, this structure gets equivalent results than an 80-probes hash table.

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Backstory (skip to second-to-last paragraph for data structure part): I’m working on a compression

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply