Disclaimer
Copyright (c) Arturo San Emeterio Campos 1999. All rights reserved.
Permission is granted to make verbatim copies of this document for private
use only.
Download
Download the whole article zipped.
Table of contents
Introduction
This article will discuss about how to implement a very simple compression
algorithm rle. Due to the fact that it's very easy to see, it can be a
good introduction to programmers interested in data compression. Rle is
mainly used to compress runs of the same byte. However rle can also be
good for the first stage of our bwt compressor, because we'll avoid too
much time sorting strings that are equal due to the fact that there's a
lot of runs with the same bytes in the data.
Rle
The rle algorithm maybe adapted to suit your needs, but the main idea
is to replace runs of the same data (like 'aaaaa' or '00000') with a counter
saying how many repetitions we have. The idea is quite simple, if we have
'aaaa' we just output 'a', and then a byte, 3 meaning that there was 3
bytes more of the same kind. The first scheme we have seen had a big problem,
'aaaa' will of course be compressed ok, but what about 'bc'? it will be
compressed as b,0,c,0 and this is not a good idea, because it will expand
the data instead of compressing it, we have to be more precise. What about
something like that: if we found two bytes repeated then we output
the number of times we have the same byte again. Example: 'aaaa' will be
encoded as 'aa',2 and 'bc' will be encoded as 'bc'. This scheme is good
enough for our needs. We also could make our scheme try to put a byte with
the number of equal bytes after three or more equal bytes.
Encoder
As you've saw the implementation is very easy, you get two bytes, if
they are equal output both of them, and then count how many bytes equal
to the first you have, then output this value, and continue encoding, of
course you have to discard the repeated bytes. Note that the value can't
be greater than 255, we are using a byte to represent it. If the bytes
were not equal, then output the first, make the second the first, and get
the next byte as a second, and start again.
Decoder
The decoder is easier than the encoder, remember: a decoder almost
always is easier than an encoder. As before, pseudo code, just rewrite
it in your favourite language.
Example
A little example, we have the file: "aaaaaabcddccc" The encoder should
output that: a,a,4,b,c,d,d,0,c,c,1 And the decoder will be able to
decompress it.
Closing words
If you find any mistake in this article or want something to improve
email
me. Coz you're reading an article about rle I suppose that you are
a beginner, in this case I recommend you to learn lz77
or lzp. Good luck. If you find any mistake
in this article, email me.
Contacting the
author
You can reach me via email at: arturo@arturocampos.com
Also don't forget to visit my home page http://www.arturocampos.com
where you can find more and better info. See you in the next article!
This article comes from Arturo Campos home page at http://www.arturocampos.com Visit again soon for new and updated compression articles and software.