A quick tutorial on generating a huffman tree
Confused about Huffman Trees. Near the end of that link above, it shows the tree with 2 elements left, and then the completed tree. I’m confused about the way that it is branched. Is there a specific way a huffman tree needs to be branched?
For example, 57:* with its right child 35:* is branched off to the right. Could it have been 35 branched to the left with 22 branched to the right? Also, why wasn’t 22:* paired up with 15:4 – it just paired with 20:5 to create a new tree.
From initial obersvations it seems the tree does not need to be balanced or have any specific order other than that the frequencies of a leaf add up to the value of the parent node. Could two people creating a huffman tree with the same data end up with different encoding values?
The key to Huffman trees is this:
If you have more than two elements that have the lowest frequency (e.g. 3,4,4…), any two will do (3 and either of 4s – not two 4s). Also, it is not important which of these lowest elements is assigned 0 and which is 1. These two facts allow different yet valid Huffman encodings to arise from the same data.
The Huffman tree is supposed to be balanced by frequencies, not by the number of nodes. Thus the following is balanced:
and this is not:
Specifically in your question, 15 is paired with 20 and not 22 because 15 and 20 are the two lowest remaining values (both lower than 22). Either branching (left or right) would have been fine, as long as it’s consistent (always smaller-left, or always smaller-right, within the same algorithm, so that the encoding can be reconstructed at the other end).