Implementing huffman decoding
by
Arturo San Emeterio Campos







Disclaimer
Copyright (c) Arturo San Emeterio Campos 2000. All rights reserved. Permission is granted to make verbatim copies of this
document for private use only.

Note
"Compression programming" is on a early stage. However the text itself is complete and will suffer only minor modifications.
 

Table of contents
How to decode the codes
Implementing the decompression process
 
 

How to decode the codes

Let's recall again our example. At the start of the decoding process we are in the root of the huffman binary tree. Then we get a bit, and depending if it's 0 or 1 we go to the left or the right respectively. Of course this process stops when we hit a leaf, which contains the symbol to decode.

Let's see an example, we have the following bit stream "00110". We get the bits left to right. At the start we are at the root. The first bit we get is "0" so we go to the left, F4 because it's not a literal we keep on decoding. We then get another "0" and go to the left again. F2, a fictive node, thus we have to get another bit. Note that we don't need to store any information about the previous bits. Now it's a "1", finally we hit a literal "i", we can output it.

If we had to decode another symbol, first we would go to the right because we had a "1". Then in F3 a "0" would lead us to the symbol "a" and we would output it.
 

Implementing the decompression process

At first we have as much as the compressor does, a probability distribution. The compressor made a code table. The decompressor doesn't use this method though. It instead keeps the whole huffman binary tree, and of course a pointer to the root to do the recursion process.

In your implementation you'll make the tree as usual and then you'll store a pointer to last node in the list, which is the root. Then the process can start. We'll navigate the tree by using the pointers to the children that each node has. This process is done by a recursive function which accepts as a parameter a pointer to the current node, and returns the symbol (in our case a byte). This is the code for this function:

byte huffman_decode_byte ( pointer to binary tree node )
{
//First we have to check if we are in a fictive node or in leaf.
//In our case we decided that fictive nodes have a value higher
//than any other symbol. If using bytes, 256. That's the way
//we know it.

if( node->value != 256 )
 {
 //It's a leaf, return the symbol
 return node->value;
 }
else
 {
 //We are on a fictive node, we have to right to the
 //left or the right node depending on the next bit.
 //We'll return the value returned by the function
 //called, which will be the decoded byte.
 if( get_bit() == 0)
  {
  //Go to the left
  return ( huffman_decode_byte( node->left_node ) );
  }
 else
  {
  //Go to the right
  return ( huffman_decode_byte( node->right_node ) );
  }
 }

}

This function would be called using as a parameter the pointer to the root. When the byte is found it will be returned to the calling function, and at last to the original call.
 


First version of this article: 11-Sept-2000 Last update of this article: 17-Sept-2000
This is part of Compression programming by Arturo Campos (email address: arturo@arturocampos.com).
http://www.arturocampos.com Visit again soon for new and updated compression articles and software.