Possible Duplicate:
need help on how to encode words using huffman code
Suppose I have the following Huffman coded symbols
A – 0
B – 10
C – 110
D – 111
and that you want to encode the sequence
A B A A C D A D B B
then I get the binary code like this:
01000110 11101111 010
(01000110) = 0x46
(11101111) = 0xEF
010=????
if there is no 010 in this code, I can save these byte into a file.
now how should I process this 010? Save it as 00000010? that doesn’t work.
You’ll need some header for your encoded data (a byte should be enough for this purpose, but you may need more, depending on what you actually need) and you can store how many padding bits you have in the last data byte. So in your example with
010your header byte would contain5because you have 5 bits at the end of the last byte that you need to ignore.Finding what values are stored in the bits that are actually useful is something that you need to handle bit by bit, since you can have overlapping codes where a code may be split between two bytes – so some bits may be at byte
Nand some bits may overlap onto byteN+1.