Huffman algorithm, making codes from probabilities
by
Arturo San Emeterio Campos








Disclaimer
Copyright (c) Arturo San Emeterio Campos 2000. All rights reserved. Permission is granted to make verbatim copies of this
document for private use only.

Note
"Compression programming" is on a early stage. However the text itself is complete and will suffer only minor modifications.
 

Table of contents
Making binary codes from probabilities
Huffman algorithm
Huffman example step by step
 
 

Making binary codes from probabilities

We have a text like "aaabc", with the probabilities you can see below. Because we can output bits, what about if we make a binary decision? the most probable or the rest, 0 or 1. In our example, 0 for 'a', then 1 for the rest, 'b' and 'c', and then we have to choose again: 0 for 'b' and 1 for 'c'. Using that codes we can output "01001110" and the decoder will correctly decode "abacb". That is because the codes have the prefix property and also are instantaneously decodable. Notice too that they are close to the entropy.
 
Symbol Probability Code Entropy (ideal code length)
a 3/5 0 (1 bit) 0.737 bits
b 1/5 10 (2 bits) 2.322 bits
c 1/5 11 (2 bits) 2.322 bits

Because we are doing a binary decision, we could represent it with a binary tree. A binary tree is a data structure, whose shape is like a tree (but upside down). It has an starting node called root. The root has two children (pointers to two other nodes), in our case we assign 0 to one child and 1 to the other. If a node has no child it's called a leaf, here it is where there are the symbols. In the example you can see the probability of the node in brackets.

It looks that if we want to assign the right codes, the trick is having similar probabilities at every node. For example if 'a' had a probability of 2, the probabilities at every child of the root wouldl be the same, and the codes would achieve the entropy (a entropy(2/4)=1 bit, b entropy(2/4)=2 bits).

The way to achieve that is making the probability of the current node the sum of the probabilities of its children. That is what the smart David Huffman realized long ago and published in the paper called "A Method for The Construction of Minimum Redundancy Codes" in 1952. In the next section we'll see his method.
 

Huffman algorithm

To achieve optimality huffman joins the two symbols with lowest probability and replaces them with a new fictive node whose probability is the sum of the other nodes' probabilities (a fictive node is a node that doesn't hold a symbol). The two removed symbols are now two nodes in the binary tree. The fictive node it's their parent and it's not inserted in the binary tree yet. Instead we keep it in the input list. We then repeat the process till the input list is empty. Below you can see a probability distribution and its huffman binary tree.
 
Symbol Probability Code Entropy (ideal code length)
e 3/10 10 (2 bits) 1.737 bits
a 2/10 01 (2 bits) 2.322 bits
o 2/10 11 (2 bits) 2.322 bits
i 1/10 001 (3 bits) 3.322 bits
! 1/10 0001 (4 bits) 3.322 bits
u 1/10 0000 (4 bits) 3.322 bits

Huffman tree with probabilitiesHuffman tree showing codes
Huffman tree with probabilities and Huffman tree showing codes.

In practice we sort the list by the probability (highest probability, first position) instead of searching for the two symbols with lowest probability. That way we can directly get the last two nodes and put them on the output binary tree. Because we have to put in the same list a new fictive node, the entries of the input list use the same data structure as the binary tree. In fact before sorting we copy the probabilities from the model to the input list. When there's only one element left on the input list, the binary tree is done. That node is the root. Now we could already make the codes.

Before having a closer look to the code, let's see the example shown above step by step.
 

Huffman example step by step

To start with we sort the list. The binary tree is still empty, while the list is full.

LIST:
e 3
a 2
o 2
i 1
u 1
! 1

BT:
...

Now we join the two last symbols (in the implementation they'll be nodes in fact) into a single one, the fictive node 1 (f1). Then we put those two nodes in the output binary tree. Fictive node 1 points to them (a fictive node is a node that doesn't hold a symbol). Now we sort the list, to be sure that the two last nodes are the ones with lowest probability.

LIST:
e 3
a 2
o 2
f1 2 - u i
i 1

BT:
u 1
! 1

We do again the same operations. Now we have stored the fictive node 1 in the output binary tree. Notice that sorting after removing the two nodes and putting f2 now will have an impact on the binary tree. Remember that we sorting them to have the probabilities of the children balanced (recall the example). Also have a look at f2: it points to 'i' (a symbol) and to 'f1' (a fictive node).

LIST:
e 3
f2 3 - f1 i
a 2
o 2

BT:
f1 2 - u i
i 1
u 1
! 1

We keep making the binary tree. The pointers in a fictive node point to the binary tree.

LIST:
f3 4 - a o
e 3
f2 3 - f1 i

BT:
a 2
o 2
f1 2 - u i
i 1
u 1
! 1

Fortunately we have algorithms and loops which do such repetitive task for us.

LIST:
f4 6 - f2 e
f3 4 - a o

BT:
e 3
f2 3 - f1 i
a 2
o 2
f1 2 - u i
i 1
u 1
! 1

And the last step. There's only one node left in the input list. This is the root, the start of the binary tree.

LIST:
f5 10 - f4 f3

BT:
f4 6 - f2 e
f3 4 - a o
e 3
f2 3 - f1 i
a 2
o 2
f1 2 - u i
i 1
u 1
! 1

Now let's see how does our binary tree look like.

Huffman tree with probabilities

Now we can make the code table by recursion: a function is called with a pointer to the root, it then calls itself with the pointer of the left child, and after that with a pointer to the root node. But before calling, it checks if current node is a leaf (if it contains a symbol), in that case it writes out the code and code length of that symbols and returns. If we do this process with our current binary tree, we would get the following code table (assuming 0 left, 1 right).
 
Symbol Probability Code Entropy (ideal code length)
e 3/10 10 (2 bits) 1.737 bits
a 2/10 01 (2 bits) 2.322 bits
o 2/10 11 (2 bits) 2.322 bits
i 1/10 001 (3 bits) 3.322 bits
! 1/10 0001 (4 bits) 3.322 bits
u 1/10 0000 (4 bits) 3.322 bits

Because the codes are restricted to be of an integer length, huffman can't achieve the ideal code length, it's however close. In practice we restrict the code length to 32 bits. Our codes are made left to right, when outputting them, the first bit to output is the leftmost one. In the case of 'a' first we output '1' and then '0'. Use my bitio functions, which take account of that case.

Now that we have seen the idea about what do we have to do and how, let's see how do we have to implement it and which data structures should we use. We'll see the code for this elegant algorithm. You can read now Implementing huffman code construction and its data structures.
 



First version of this article: 11-Sept-2000 Last update of this article: 17-Sept-2000
This is part of Compression programming by Arturo Campos (email address: arturo@arturocampos.com).
http://www.arturocampos.com Visit again soon for new and updated compression articles and software.