This is a question I ran into in school settings, but it keeps bugging me so I decided to ask it here.
In Huffman compression, fixed-length sequences (characters) are encoded with variable-length sequences. The code sequence length depends on the frequencies (or probabilities) of the source characters.
My questions is: what is the minimum highest character frequency, with which that character will be encoded by a single bit?
It turns out that the answer is 0.4, that is, if the highest frequency p is p >= 0.4, 1-bit code for the corresponding character is guaranteed. In other words, this is a sufficient condition.
It is also true that p >= 1/3 is a necessary condition. That is, there may be examples where 0.4 > p >= 1/3, and the shortest code is 1-bit, but there are no cases like that if p < 1/3.
The way to reason about that is to look at the way the code tree is constructed, in particular at the frequencies of the 3 last surviving subtrees. A proof appears in Johnsen, “On the redundancy of binary Huffman codes”, 1980 (unfortunately this is a paid link).