Huffman algorithm, making codes from probabilities
by
Arturo San Emeterio Campos
Disclaimer
Copyright (c) Arturo San Emeterio Campos 2000. All rights reserved.
Permission is granted to make verbatim copies of this
document for private use only.
Note
"Compression programming" is on a early stage. However the text itself
is complete and will suffer only minor modifications.
Table of contents
Making binary codes
from probabilities
Huffman algorithm
Huffman example step by step
Making binary codes from probabilities
We have a text like "aaabc", with the probabilities you can see below.
Because we can output bits, what about if we make a binary decision? the
most probable or the rest, 0 or 1. In our example, 0 for 'a', then 1 for
the rest, 'b' and 'c', and then we have to choose again: 0 for 'b' and
1 for 'c'. Using that codes we can output "01001110" and the decoder will
correctly decode "abacb". That is because the codes have the prefix property
and also are instantaneously decodable. Notice too that they are close
to the entropy.
| Symbol | Probability | Code | Entropy (ideal code length) |
| a | 3/5 | 0 (1 bit) | 0.737 bits |
| b | 1/5 | 10 (2 bits) | 2.322 bits |
| c | 1/5 | 11 (2 bits) | 2.322 bits |
Because we are doing a binary decision, we could represent it with a binary tree. A binary tree is a data structure, whose shape is like a tree (but upside down). It has an starting node called root. The root has two children (pointers to two other nodes), in our case we assign 0 to one child and 1 to the other. If a node has no child it's called a leaf, here it is where there are the symbols. In the example you can see the probability of the node in brackets.
It looks that if we want to assign the right codes, the trick is having similar probabilities at every node. For example if 'a' had a probability of 2, the probabilities at every child of the root wouldl be the same, and the codes would achieve the entropy (a entropy(2/4)=1 bit, b entropy(2/4)=2 bits).
The way to achieve that is making the probability of the current node
the sum of the probabilities of its children. That is what the smart David
Huffman realized long ago and published in the paper called "A Method for
The Construction of Minimum Redundancy Codes" in 1952. In the next section
we'll see his method.
To achieve optimality huffman joins the two symbols with lowest probability
and replaces them with a new fictive node whose probability is the sum
of the other nodes' probabilities (a fictive node is a node that doesn't
hold a symbol). The two removed symbols are now two nodes in the binary
tree. The fictive node it's their parent and it's not inserted in the binary
tree yet. Instead we keep it in the input list. We then repeat the process
till the input list is empty. Below you can see a probability distribution
and its huffman binary tree.
| Symbol | Probability | Code | Entropy (ideal code length) |
| e | 3/10 | 10 (2 bits) | 1.737 bits |
| a | 2/10 | 01 (2 bits) | 2.322 bits |
| o | 2/10 | 11 (2 bits) | 2.322 bits |
| i | 1/10 | 001 (3 bits) | 3.322 bits |
| ! | 1/10 | 0001 (4 bits) | 3.322 bits |
| u | 1/10 | 0000 (4 bits) | 3.322 bits |

Huffman tree with probabilities and Huffman tree showing codes.
In practice we sort the list by the probability (highest probability, first position) instead of searching for the two symbols with lowest probability. That way we can directly get the last two nodes and put them on the output binary tree. Because we have to put in the same list a new fictive node, the entries of the input list use the same data structure as the binary tree. In fact before sorting we copy the probabilities from the model to the input list. When there's only one element left on the input list, the binary tree is done. That node is the root. Now we could already make the codes.
Before having a closer look to the code, let's see the example shown
above step by step.
To start with we sort the list. The binary tree is still empty, while the list is full.
LIST:
e 3
a 2
o 2
i 1
u 1
! 1
BT:
...
Now we join the two last symbols (in the implementation they'll be nodes in fact) into a single one, the fictive node 1 (f1). Then we put those two nodes in the output binary tree. Fictive node 1 points to them (a fictive node is a node that doesn't hold a symbol). Now we sort the list, to be sure that the two last nodes are the ones with lowest probability.
LIST:
e 3
a 2
o 2
f1 2 - u i
i 1
BT:
u 1
! 1
We do again the same operations. Now we have stored the fictive node 1 in the output binary tree. Notice that sorting after removing the two nodes and putting f2 now will have an impact on the binary tree. Remember that we sorting them to have the probabilities of the children balanced (recall the example). Also have a look at f2: it points to 'i' (a symbol) and to 'f1' (a fictive node).
LIST:
e 3
f2 3 - f1 i
a 2
o 2
BT:
f1 2 - u i
i 1
u 1
! 1
We keep making the binary tree. The pointers in a fictive node point to the binary tree.
LIST:
f3 4 - a o
e 3
f2 3 - f1 i
BT:
a 2
o 2
f1 2 - u i
i 1
u 1
! 1
Fortunately we have algorithms and loops which do such repetitive task for us.
LIST:
f4 6 - f2 e
f3 4 - a o
BT:
e 3
f2 3 - f1 i
a 2
o 2
f1 2 - u i
i 1
u 1
! 1
And the last step. There's only one node left in the input list. This is the root, the start of the binary tree.
LIST:
f5 10 - f4 f3
BT:
f4 6 - f2 e
f3 4 - a o
e 3
f2 3 - f1 i
a 2
o 2
f1 2 - u i
i 1
u 1
! 1
Now let's see how does our binary tree look like.
Now we can make the code table by recursion:
a function is called with a pointer to the root, it then calls itself with
the pointer of the left child, and after that with a pointer to the root
node. But before calling, it checks if current node is a leaf (if it contains
a symbol), in that case it writes out the code and code length of that
symbols and returns. If we do this process with our current binary tree,
we would get the following code table (assuming 0 left, 1 right).
| Symbol | Probability | Code | Entropy (ideal code length) |
| e | 3/10 | 10 (2 bits) | 1.737 bits |
| a | 2/10 | 01 (2 bits) | 2.322 bits |
| o | 2/10 | 11 (2 bits) | 2.322 bits |
| i | 1/10 | 001 (3 bits) | 3.322 bits |
| ! | 1/10 | 0001 (4 bits) | 3.322 bits |
| u | 1/10 | 0000 (4 bits) | 3.322 bits |
Because the codes are restricted to be of an integer length, huffman can't achieve the ideal code length, it's however close. In practice we restrict the code length to 32 bits. Our codes are made left to right, when outputting them, the first bit to output is the leftmost one. In the case of 'a' first we output '1' and then '0'. Use my bitio functions, which take account of that case.
Now that we have seen the idea about what do we have to do and how,
let's see how do we have to implement it and which data structures should
we use. We'll see the code for this elegant algorithm. You can read now
Implementing
huffman code construction and its data structures.