In this blogpost, I'm going to talk about the JPEG compression algorithm and how it works. It's a very ingenious algorithm that diminishes the file size of a photograph. The JPEG compression is a lossy compression algorithm (altough a lossless variant exists) that throws some information away (that's almost invisible) and in that way decreases the filesize.
Discrete Cosine Transform (DCT)
Now, how does this work? First of all I'm going to talk about the Discrete Cosine Transform (DCT). This transform decomposes a vector of values into cosine base functions. It thus transforms the values from the time domain to the frequency domain. The higher the frequency, the more the image (or some other signal) changes over a certain period of time. JPEG uses the YCbCr-color model. Y is the luminance and Cb and Cr are chrominances (colors). An image is divided into blocks of 8 by 8 pixels and this for every channel. Every pixel in such a block contains a specific value (either for Y, Cb or Cr). By applying the 2 dimensional DCT (because a block of 8 by 8 pixels consists of rows and columns), we transform every pixel value into a weight for some cosine base function. Every block can be reconstructed by adding a certain amount of some cosine base functions together. These base functions look like this:
We're going to describe the full JPEG-encoding process by an example. The next picture gives the Y-channel information for a certain 8 by 8 block of an image.
The Y-values are given by the following matrix:
Now, before transforming these values by the DCT, we have to center the values around zero. Our original range was [0, 255] and we map this onto the range [-128, 127] by simply subtracting 128 from each value. We then find the following matrix:
Now, we transform these by using the 2D DCT which is given by the following formula:
We find the following matrix. Note that all values are now weights specifiying what the role of a certain cosine base function in our final result is. One special property is that in almost all cases the higher frequencies will have a much lower value than the lower frequencies because there aren't much changes in an 8 by 8 block (the lower frequencies are situated in the left upper corner and the higher frequencies in the right lower corner).
Quantization is the process where a set of values is mapped onto a smaller set of values to save disk space. Less different values need less bits to encode. This is a quite simple process. Quantization is usually accomplished by dividing all values by some integer and then rounding the results to integers. JPEG uses quantization tables that define which weight has to be divided by which number. These tables are designed to divide the weights of the higher frequencies by higher numbers, because higher frequencies have a smaller impact on the final result. In the above example, we can use for example the following quantization table:
This quantization table is the one that's used by the official JPEG-specification for the 50%-quality setting. Every quality setting has a different quantization table. The higher the quality settings, the smaller the numbers by which we divide.
So, for the first element of the weight table we find |-415.38| / 16 = 25.9375. We then have to round this value to the nearest integer, giving us 26. We repeat this process for every element in the weights-matrix. The results are given by:
Compressing (Entropy coding)
The entropy in compression algorithms is the minimum amount of bits needed to code all symbols in a given information source (for example the values we found above). It's a lower limit that enables us to compare several lossless compression algorithms. Note that we are talking about lossless compression algorithms here. This may sound strange as JPEG is a lossy compression algorithm, but the lossy-step (involving loss of information) is performed by the quantization. The quantized values are now encoded by an algorithm known as the Huffman coding. This is optimal and does a very good job (Huffman coding was the most optimal lossless algorithm for quite some time, but it's outperformed by Arithmetic coding the last years). Before applying the Huffman coding, the algorithm creates a bitstream by zigzagging through the weighted values, as seen in this picture:
The bitstring that results from applying the Huffman coding to the found values is saved. All other information is discarded, thus saving a lot of disk space! Decoding JPEG works by reversing the process.
Post created by Pieter Verschaffelt on 2016-06-28 19:24:45