"Run Length Encoding"
by
Arturo San Emeterio Campos





Disclaimer
Copyright (c) Arturo San Emeterio Campos 1999. All rights reserved. Permission is granted to make verbatim copies of this document for private use only.
 

Download
Download the whole article zipped.
 

Table of contents

  • Introduction
  • Rle
  • Encoder
  • Decoder
  • Example
  • Closing words
  • Contacting the author

  •  

     
     
     
     

    Introduction
    This article will discuss about how to implement a very simple compression algorithm rle. Due to the fact that it's very easy to see, it can be a good introduction to programmers interested in data compression. Rle is mainly used to compress runs of the same byte. However rle can also be good for the first stage of our bwt compressor, because we'll avoid too much time sorting strings that are equal due to the fact that there's a lot of runs with the same bytes in the data.
     

    Rle
    The rle algorithm maybe adapted to suit your needs, but the main idea is to replace runs of the same data (like 'aaaaa' or '00000') with a counter saying how many repetitions we have. The idea is quite simple, if we have 'aaaa' we just output 'a', and then a byte, 3 meaning that there was 3 bytes more of the same kind. The first scheme we have seen had a big problem, 'aaaa' will of course be compressed ok, but what about 'bc'? it will be compressed as b,0,c,0 and this is not a good idea, because it will expand the data instead of compressing it, we have to be more precise. What about something like that: if we found two bytes repeated then we output the number of times we have the same byte again. Example: 'aaaa' will be encoded as 'aa',2 and 'bc' will be encoded as 'bc'. This scheme is good enough for our needs. We also could make our scheme try to put a byte with the number of equal bytes after three or more equal bytes.

    Encoder
    As you've saw the implementation is very easy, you get two bytes, if they are equal output both of them, and then count how many bytes equal to the first you have, then output this value, and continue encoding, of course you have to discard the repeated bytes. Note that the value can't be greater than 255, we are using a byte to represent it. If the bytes were not equal, then output the first, make the second the first, and get the next byte as a second, and start again.

    Ok? Well, now go and implement it. of course both the encoder and the decoder end when have read all the bytes, I suggest you keeping a variable, or a register with the total length, or something like that.
     

    Decoder
    The decoder is easier than the encoder, remember: a decoder almost always is easier than an encoder. As before, pseudo code, just rewrite it in your favourite language.
     

    Here it is, if you implement it, you have a Rle with byte based output.
     

    Example
    A little example, we have the file: "aaaaaabcddccc" The encoder should output that:  a,a,4,b,c,d,d,0,c,c,1 And the decoder will be able to decompress it.

    Closing words
    If you find any mistake in this article or want something to improve email me. Coz you're reading an article about rle I suppose that you are a beginner, in this case I recommend you to learn lz77 or lzp. Good luck.  If you find any mistake in this article, email me.
     

    Contacting the author
    You can reach me via email at: arturo@arturocampos.com  Also don't forget to visit my home page http://www.arturocampos.com where you can find more and better info. See you in the next article!
     
     

    Arturo San Emeterio Campos, Barcelona 24-Jul-1999


    This article comes from Arturo Campos home page at http://www.arturocampos.com Visit again soon for new and updated compression articles and software.