Using linked lists to access higher orders
by
Arturo San Emeterio Campos








Disclaimer
Copyright (c) Arturo San Emeterio Campos 2000. All rights reserved. Permission is granted to make verbatim copies of this
document for private use only.

Note
"Compression programming" is on a early stage. This text is not finished.
 

Table of contents
Why do we have to use linked list?
What is a linked list?
How do we use a linked list?
Data structures
Implementing finding a context
Implementing adding a new element and updating
 
 

Why do we have to use linked list?

In order-3 hashed different contexts share the same hash entry. We can use singly linked lists to solve this problem. Linked list are an extension of hash tables used for such cases when you don't know before hand the number of elements that will be inserted in the same entry.

As an overview I would say that in the hash entry we have the context information, which in a simple implementation is only a pointer to the data. Instead we put a pointer to the linked list, which then has the different elements that share the same entry. Special care has to be taken with the managing of the data structures, now before using an element we'll have to find it.
 

What is a linked list?

A linked list is a data structure. You can think of it like an array. However the linked list is for when you expect that you won't need too many entries, or you don't know the number of entries used before hand. That's exactly the case of lzp if using hashing. We can know the maximum number of contexts that may share the same entry. But we don't know how many context will in practice share it.

Linked lists allocate memory for an element when it's added. That's what makes them useful for lzp because in practice though a lot of contexts may appear, just a few do. Due to the allocation, you can't find every element one after another in memory. That is why we must have a way to know where are they located: For the first allocated element that's easy, we put a pointer to it in the hash entry. For the second one we put a pointer in the first one to the second. For the third one we put a pointer in the second one and so on. In the last element the pointer to the next one equals 0. That's the way we know when we have reached the end of the linked list. In short words, an element in a linked list has the context information plus a pointer to the next element.

In the original order-2 lzp the only information that had to be stored about the context, was a pointer to the content of it. We didn't need more because every context had its own entry. But now different contexts are in the same linked list (because they share the same hash entry). Thus we need a way to know the difference between the elements. That's obviously the context. So we also store a copy of the context in the element. Let's see the data used to access all the contexts belonging to the same hash entry.

Hash table: pointer to element 34

Linked list:
Element 34. Context "he". Pointer to data. Pointer to element 43
Element 43. Context "ht". Pointer to data. Pointer to element 56
Element 56. Context "ha". Pointer to data. Pointer equals 0.
 

How do we use a linked list?

We'll use the linked list to put more than one context in the same hash entry. That means we have three kind of operations. Adding a new context, looking for a context or updating a context. When doing any of the operations, there can be two cases:

1- There's no linked list, pointer in the hash table equals 0
2- There's a linked list, hash table entry points to first element

The first thing to do is to search for a given context. Before reading the linked list, we have to check that it exist. This is done by checking the pointer in the hash entry corresponding to the current context. If it equals 0 we know it doesn't exist, and thus the compressor has to act like if there was no context. Then we have to read the whole linked list till we find the context, if we don't, we make the same as if there was no linked list at all. Otherwise return the pointer.

Adding a new element is more or less the same. In that case you have to read till the end of the linked list. Now allocate memory for the new element, and then put pointer to this new element in the last one. Now we only have to init the values of the new element.

Updating is the same as searching one element. But once we find it we update its context information instead of returning a pointer to it. However we don't need to read the linked list. That's because if we update a context that means it existed, and that we searched it. Thus if after finding the element we keep a pointer to it, we don't need to read the linked list twice.
 

Data structures

The data structures that we'll use in the examples:

- hash_table, this table stores in every entry a pointer to the linked list of the contexts that share the same entry.
- pointer_to_ll, pointer to a linked list which is an structure that has the following data:
  - context, it stores the context that makes every element different. For example if we were using order-2 for hash table (with 2^16 entries) and we wanted to access order-3 with the linked list, we would only need to store the third context byte. But in case of order-3 hashed we store the whole order-3.
  - offset, pointer to the content, used to match against.
  - nextll, pointer to next element in the linked list.
-nextfreenode is a variable, a pointer_to_ll. It's the address of the memory allocated for a new element. It's up to you to allocate and free this memory.
 

Implementing finding a context

The function of finding a context has to return a value: 0 in case that the element didn't exist, or a pointer to the element. Also have to store in a global variable the pointer, so the updating function can use it.

 //First check if the hash table is initialized
 if(hash_table[hash_entry] == 0)
   {
   return 0  //no context found
   }

 //Now read the linked list till we find it
 while(pointer->context != context)
   {
   //Check if this is the last element
   if(pointer->nextllnode == 0)  //last element
     {
     //Store pointer to end of the linked list, for
     //faster updating
     usedllnode=pointer;
     //offset not found (no matching context)
     return 0;
     }
   pointer=pointer->nextllnode;  //next pointer
   }
 usedllnode=pointer;  //store pointer for updating
 return pointer->offset;  //return offset to match
 

Implementing adding a new element and updating

In the main loop of lzp, we have to create a new element in two different cases:

1-There was no context, the hash entry was not initialized. In that case we have to put in the hash table a pointer to the new one. And create the first element.
2-There was no matching context, but there existed a linked list in the hash entry. Then we have to create a new element at the end of the linked list.

In both cases the rest of the code is the following:

 //put context
 nextfreenode->context=context;
 //put pointer to match
 nextfreenode->offset=pointer_bytes;       //mark as last element in the linked lists
 //now this is the last node
 nextfreenode->nextllnode=0;

Updating is very simple. Because you stored a pointer to the element in usedllnode, you just have to access it and modify it as desired.
 


First version of this article: 19-Sept-2000 Last update of this article: 25-Sept-2000
This is part of Compression programming by Arturo Campos (email address: arturo@arturocampos.com).
http://www.arturocampos.com Visit again soon for new and updated compression articles and software.