Disclaimer
Copyright (c) Arturo San Emeterio Campos 1999. All rights reserved.
Permission is granted to make verbatim copies of this document for private
use only.
Download
Download the whole article, also
with the source code.
Table of contents
Introduction
This article is aimed to a programmer who has implemented Huffman,
and wants it to be bug free. If you still don't know huffman learn
it.
Fibbonaci series
I'll only explain the basics of it, if you want to know how to do an
efficient implementation on it, look for an article about it in my h-page.
(others) Fibbonaci series are defined that way:
Fibbonaci and Huffman
The problem comes when we have a set of probabilities, which is a fibbonaci
serie. Look at this probabilities:
|
|
Probability |
|
|
1/19 |
|
|
2/19 |
|
|
3/19 |
|
|
5/19 |
|
|
8/19 |
Those probabilities are Fib(6). (the 0 probability is not used) This will make the following huffman tree (when n means node):
Root
/ \
e n
/ \
d n
/ \
c n
/ \
b
a
With the following codes:
|
|
Probability | Code |
|
|
1/19 | 0000 |
|
|
2/19 | 0001 |
|
|
3/19 | 001 |
|
|
5/19 | 01 |
|
|
8/19 | 1 |
This kind of tree is called lopsided, or unbalanced. As we can see we
the maximum length is alphabetsize-1, where alphabetsize=5. (the different
symbols) And here it is the problem, we only had 5 symbols, but if we had
256 symbols our maximum code could be 255! and usually we design our codecs
to work with codes of 16 bits long, and even 32 bits long, but never
255 bits long.
Solution
Most implementations use 16 bits code -- no need of more -- so I'll
deal with how to avoid it with 16 bits codes. The maximum code length is
-- obviously -- 16, so we'll have a problem if we have a set of probabilities
till fib(17), but for getting probabilities till fib(x) we need to get
bytes:
x
S fib(i)
i=0
So for getting the fibbonaci probabilities for fib(6), we'll need to
get at least: 0+1+1+2+3+5+8=20 bytes. And for Fib(16+1) we'll need 4180.
So what we have to do? we'll change the probabilities, so they don't make
an unbalanced tree, we'll do that dividing the probabilities by a factor,
we'll choose 2. (just a shift right) First you gather the probabilities,
if all the probabilities (added) make 4180, then you change the probabilities,
we divide every probability by 2. But be careful, no probability should
go beyond 1, or then you'll not assign any code to that probability. This
operation is usually done shifting the value to the right one position,
and perform an or with the value 1. Once it's less than 4180 you can construct
the tree.
My implementation
In fact, all this theory is ok, but this wasn't how I did it. Because
I wanted teh maximum length of a code to be 16, I kept a flag, if once
any code went above 16, then I put this flag to 1, when finished doing
the codes, I checked if the flag was set, if it was, then I divided all
the probabilities by 2. Remember to store the original probabilities apart
(so they aren't changed while building the tree) and to copy the new probabilies
in the old ones, so you don't enter an infinite loop in the case that the
first division can't make your maximum code length 16.
Closing words
In the zip there's also a little program in C which
prints fib(x) and also computes this:
x
S fib(i)
i=0
Well, if ever your huffman code hanged, and you didn't knew why, this
is -- probably -- the solution for it. If you've implemented huffman you
should also do canonical huffman.
If you found any errors, and have any idea about how to improve this article,
email-me.
Contacting the
author
You can reach me via email at: arturo@arturocampos.com
Also don't forget to visit my home page http://www.arturocampos.com
where you can find more and better info. See you in the next article!
This article comes from Arturo Campos home page at http://www.arturocampos.com Visit again soon for new and updated compression articles and software.