There are several intuitive ways to losslessly encode image data. The simplest approach is to simply code each pixel independently. Alternatively, pairs of pixels may be coded together, since neighbouring pixels will often have a high degree of correlation. If successful, these methods will lead to coding rates lower than the raw 8 bits per pixel.
By using an appropriate code, the bit rate of the encoded data can come close to the entropy of the data itself. This is achieved by using a code in which the length of each codeword is proportional to the frequency that it is used, such as a Huffman code or arithmetic coding. On the test images this resulted in a rate of 7.39 bits per pixel -- a 7% compression.
Neighbouring pixels are generally highly correlated, as illustrated in Figure 6. In order to exploit this fact, the code may be chosen so that pixels are coded in sequential pairs rather than alone. Since there are more possibilities, this will increase the code rate; on the test images a rate of 11.98 bits per symbol was achieved. Since the symbols each represent 2 pixels, this translates to a normalized rate of 5.99 bits per pixel, or a 25% compression.
Another way to exploit this correlation is to predict the pixel value based on its neighbours, and then to code only the error. The values of the majority of these errors should be a small number of discrete integers, and hence these frequent errors should compress well.
Prediction of a pixel as the value of the adjacent left pixel resulted in a rate of 5.31 bits per pixel for the error, a compression of 33%. This is also known as differential pulse code modulation (DPCM).
It is logical that with more information, prediction could be improved. Prediction using all adjacent received pixels was then implemented using first the minimum variance predictor, and then the minimum entropy predictor [2]. The minimum variance predictor yielded a 4.85 bits per pixel rate for 39% compression. The minimum entropy predictor resulted in a rate of 4.90 bits per pixel, for 38% compression. The 2 predictors are so close that the difference between them in this case is more likely due to image content than general superiority of design.
There are special cases of image transmission in which it is desirable to separate the low-frequency components from high-frequency components. One example is the advantage of progressive transmission of large images over a low-bandwidth connection such as a telephone line, as is used on the World Wide Web. Another example is in distribution of error correction - for instance if the lower frequency components are required to be more robust under high error-rate channels.
This is a motivation for pyramid coding [3] of image data. The image is progressively filtered and subsampled into separate levels of decreasing size, until a very small dataset corresponding to the low-frequency components remains. This level is encoded. It is then interpolated to the size of the previous level, and the difference between the levels is computed. This difference is then encoded. Successive differences between each pair of levels are encoded in this fashion. The lowest level is itself encoded.
Pyramid coding results in a greater number of samples to be encoded
(up to
more if 1:2 subsampling is used in each dimension), but
this tends to be counteracted by the better compressibility of the
seperate levels.
Using optimal entropy encoding, it was found that the average coding rate for a 1:2 subsampling pyramid encoder operating on the test images was 6.76 bits per pixel of the original image. This equates to a 15% compression.
An alternative approach was to find the rate for DPCM coding of the level differences. This approach was unsuccessful: the entropy of the error was greater than that of the data. The rate went up to 7.06 bits per image pixel, or 11% compression.
Lempel-Ziv [4] coding is a general-purpose compression algorithm. It operates on a sliding window of data. The advantage of this approach is that the code dictionary does not need to be sent in addition to the data, as it is implicitly defined by the structure of the output.
A dictionary of variable-length words is built up in the following manner: new data is parsed byte-by-byte until the accumulated sequence does not match a word in the window dictionary. This new n-byte sequence is stored in the dictionary as (a) a reference to the last word matched by the sequence of n-1 previous bytes and (b) the nth byte. Then the data stream is parsed again, building up further new sequences. This is repeated until the end of the data stream is reached. The output from this process then undergoes a secondary Huffman encoding in blocks.
In trials on the test images, LZ77 compression resulted in a bit rate of 6.54 bits per pixel: 18% compression.
![]() |