What's compression, who can be interested and why, and
how I'll explain it
by
Arturo San Emeterio Campos
Disclaimer
Copyright (c) Arturo San Emeterio Campos 2000. All rights reserved.
Permission is granted to make verbatim copies of this
document for private use only.
Note
"Compression programming" is on a early stage. However the text itself
is complete and will suffer only minor modifications.
Table of contents
What is data compression?
What is used data compression
for?
Why should I use data
compression?
What do I need
to understand the articles?
What will you exactly explain?
How will you explain it?
Closing words and
where we go from here
Data compression is making a file smaller by predicting the most frequent bytes and storing them in less space. Thus a compressor is made of at least two different tasks: predicting the probabilities of the input and generating codes from those probabilities, which is done with a model and a coder respectively. Optionally, some data like audio or image may be transformed or quantized to achieve more compression.
A compressor can be lossy or lossless. With a lossless compressor and
decompressor, the original and decompressed files are identical bit per
bit. In the other hand with some other data we can have big benefits in
compression efficiency by throwing away most of it, without however losing
much quality. We use lossless compression for text or binary data, and
lossy for data like signals: audio, image or video.
What is used data compression for?
For transmitting or storing the same information in less bits. That has many important different applications. But all of them the result is the same: we don't waste resources. Resources like hard disk space, money or time.
From the point of view of storing, compression is used in backups, for image and sound, internal data of programs, or to compress the program itself. Real applications can be found on image standard like JPEG, or audio standards like MP3, both helped a lot in the multimedia revolution. Also a very important side of this are archivers, commonly called compressors, like ZIP or ARJ.
But if in storing, data compression provided a valuable help, in transmission
it did even more. Year 2000, internet has an speed of around 4 kilobytes
per second. What about downloading an uncompressed 400x500 with 32 bits
of color image (800k)? or a 4 megas audio file which contains only one
minute of sound? Or downloading a 8 megas data file, instead of a compressed
version of it which just takes 2 megas? Obviously compression reduces transmission
time, which at the same time makes communications cheaper.
Why should I use data compression?
Just tell your boss that you can make hard disks twice big, your local area network twice faster and that this cost less money and takes less time than it would cost to buy and install new ones. He'll surely choose your proposition.
As a software developer you may want to implement a patent free compressor for your own application.
Students may find interesting to see the practical side of information
theory, or their programming subjects.
What do I need to understand the articles?
You must be a programmer. It doesn't matter your knowledge about compression, but in any case you should know the C programming language, or any other high level like C, because in many articles I use a C like pseudo code to further explain the ideas.
But also students, or people who just want to have a feeling about what
is compression can read The introduction to Data
Compression and get a clear picture of compression.
What will you exactly explain?
I'll explain coders and models used today, and how to plug them together to get a working compressor. The articles are focussed on lossless data compression.
If you want however to do lossy compression, you'll also find interesting articles. In that case the best you can do is reading the whole introduction, and learning some coders. Optionally ppm variants would be a good bet.
In the table of contents you can find all
the articles.
In the first article I'll give you a more in deep explanation of compression in a general way. Then I'll talk about the most used algorithms currently, and explain differences between them, so you can choose which one do you need.
From this point at on, you can already go to the section that you need to make your compressor. The rest of the articles are highly practical but I will explain the basic theory too.
I made the articles to make it easy for you to scan, skip and jump all
over the place. They explain the most important facts and ideas first,
and then go into details.
Closing words and where we go from here
I offer you the articles for free, in exchange I only ask you to give me your opinion about them. If while reading you have any question, send me an email and I'll answer as soon as possible. I look forward to your feedback: if you took the time to read this, take the time to tell me what you think of it.
If you decided to learn compression, start by The
introduction to data compression, or have a look at the table
of contents.
First version of this article: 15-July-2000 Last update of this article:
09-Sept-2000
This is part of Compression programming by Arturo Campos (email
address: arturo@arturocampos.com).
http://www.arturocampos.com
Visit again soon for new and updated compression articles and software.