Huffman introduction
by
Arturo San Emeterio Campos







Disclaimer
Copyright (c) Arturo San Emeterio Campos 2000. All rights reserved. Permission is granted to make verbatim copies of this
document for private use only.

Note
"Compression programming" is on a early stage. However the text itself is complete and will suffer only minor modifications.
 

Table of contents
What is huffman?
What do I need to know to implement it?
How will you explain it?
Closing words
 
 

What is huffman?

Huffman is a coding algorithm presented by David Huffman in 1952. It's an algorithm which works with integer length codes. In fact if we want an algorithm which does integer length codes, huffman is the best option because it's optimal.

We use huffman for example, for compressing the bytes outputted by lzp. First we have to know the probabilities of them, we use a qsm model for that matter. Based on the probabilities it makes the codes which then can be outputted. Decoding is more or less the reverse process, based on the probabilities and the coded data, it outputs the decoded byte.

To make the probabilities the algorithm uses a binary tree. It stores there the symbols and their probabilities. The position of the symbol depends on its probability. Then it assigns a code based on its position in the tree. The codes have the prefix property and are instantaneously decodable thus they are well suited for compression and decompression.
 

What do I need to know to implement it?

You must have implemented bitio functions. Also you need a model which predicts the probability of the input. Qsm is the best option because it's dynamic and thus we don't need to output the code table beforehand.
 

How will you explain it?

First in Huffman algorithm, making codes from probabilities I'll explain how does the algorithm works. Those who already know it can skip this section. Still you can find an explanation of why huffman works that way. If you see programming as a challenge, you may want to skip the next parts: Implementing huffman code construction and its data structures and Implementing huffman decoding, that cover the data structures and how to use them. Anyway just after implementing huffman coding read Restricting huffman codes to a maximum length, which is needed for all the practical implementations.
 

Closing words

There are some ways to speed decoding. I recommend checking Michael Schindler's http://www.compressconsult.com.

If you need more compression than huffman can offer, check any arithmetic coder.
 


First version of this article: 21-July-2000 Last update of this article: 17-Sept-2000
This is part of Compression programming by Arturo Campos (email address: arturo@arturocampos.com).
http://www.arturocampos.com Visit again soon for new and updated compression articles and software.