Background Information

The Discrete Cosine Transform (DCT) is very similar to the DFT and is also commonly used in compression by filtering out unnecessary frequencies present in the transformed image. JPEG employs this method of image compression due to its good ratio of compression to similarity.

The 1 dimensional DCT is defined in the following equations:

Screen Shot 2017-12-10 at 8.18.12 PM

We can see this transform is very similar to the DFT, as , but this method will only produce real valued DCT coefficients given the lack of the imaginary sine term.

Because we’re looking at image compression, the 2D version of DCT is what we are concerned with, and is defined as follows:

Screen Shot 2017-12-10 at 8.18.18 PM

It should be noted that in both versions of the DCT, the choice of the scaling factor in front of the first indexed value can vary from implementation to implementation, just as we saw with the DFT. Matlab uses a scaling factor of in order to make the basis matrix orthogonal. Also, because images are represented as a stack of three matrices (RBG), the transform and compression algorithm must be applied to each layer of the image and then recompiled.

Instead of applying the transform to the entire image, we apply it to 8 x 8 blocks across and down the image. An 8 x 8 collection of DCT basis signals looks like this:

Screen Shot 2017-12-10 at 5.59.44 PM

This basis is a collection of signals that spatially sort frequencies present in the image. As can be seen above, the lowest frequencies in x and y are arranged into the top-left and the highest are in the bottom-right, so that frequency increases along the red line. Most of the important visual information in this transformed representation is contained in the bits corresponding to low frequencies, so the image can easily be compressed by filtering out certain amounts of the higher frequency bits. This is done by applying an 8 x 8 “masking” matrix to each block of the transformed image. A few different masking matrices are shown below, which zero out 63, 61, 58 and 54 values out of 64 respectively.

Screen Shot 2017-12-10 at 8.18.28 PM

Note that the mask number corresponds to the number of ones coming out of the top left corner and goes from 1-14. As it is a triangle visually, the number of values retained in each block based on the mask number is the triangular of the mask number. Masks 0 and 15 are irrelevant as they produce a black image and recreate the original image respectively.

By calculating the size the transformed representation of the image as well as its masked version, we can establish what compression ratio we’ve achieved. Then, after transforming the images back with the inverse DCT, we can use Frobenius norms to find out how similar the recreated compressed image is to the original.

Share this: