Huffman introduction
by
Arturo San Emeterio Campos
Disclaimer
Copyright (c) Arturo San Emeterio Campos 2000. All rights reserved.
Permission is granted to make verbatim copies of this
document for private use only.
Note
"Compression programming" is on a early stage. However the text itself
is complete and will suffer only minor modifications.
Table of contents
What is huffman?
What do I need to
know to implement it?
How will you explain it?
Closing words
Huffman is a coding algorithm presented by David Huffman in 1952. It's an algorithm which works with integer length codes. In fact if we want an algorithm which does integer length codes, huffman is the best option because it's optimal.
We use huffman for example, for compressing the bytes outputted by lzp. First we have to know the probabilities of them, we use a qsm model for that matter. Based on the probabilities it makes the codes which then can be outputted. Decoding is more or less the reverse process, based on the probabilities and the coded data, it outputs the decoded byte.
To make the probabilities the algorithm uses a binary tree. It stores
there the symbols and their probabilities. The position of the symbol depends
on its probability. Then it assigns a code based on its position in the
tree. The codes have the prefix property and are instantaneously decodable
thus they are well suited for compression and decompression.
What do I need to know to implement it?
You must have implemented bitio functions.
Also you need a model which predicts the probability of the input. Qsm
is the best option because it's dynamic and thus we don't need to output
the code table beforehand.
First in Huffman algorithm, making codes from
probabilities I'll explain how does the algorithm works. Those who
already know it can skip this section. Still you can find an explanation
of why huffman works that way. If you see programming as a challenge, you
may want to skip the next parts: Implementing huffman
code construction and its data structures and Implementing
huffman decoding, that cover the data structures and how to use them.
Anyway just after implementing huffman coding read Restricting
huffman codes to a maximum length, which is needed for all the practical
implementations.
There are some ways to speed decoding. I recommend checking Michael Schindler's http://www.compressconsult.com.
If you need more compression than huffman can offer, check any arithmetic
coder.
First version of this article: 21-July-2000 Last update of this article:
17-Sept-2000
This is part of Compression programming by Arturo Campos (email
address: arturo@arturocampos.com).
http://www.arturocampos.com
Visit again soon for new and updated compression articles and software.