Image Compression using Backprop

by Paul Watta, Brijesh Desaie, Norman Dannug, Mohamad Hassoun (1996)

Computer images are extremely data intensive and hence require large amounts of memory for storage. As a result, the transmission of an image from one machine to another can be very time consuming. By using data compression techniques, it is possible to remove some of the redundant information contained in images, requiring less storage space and less time to transmit. Neural nets can be used for the purpose of image compression, as shown in the following demonstration.

A neural net architecture suitable for solving the image compression problem is shown below. This type of structure--a large input layer feeding into a small hidden layer, which then feeds into a large output layer-- is referred to as a bottleneck type network. The idea is this: suppose that the neural net shown below had been trained to implement the identity map. Then, a tiny image presented to the network as input would appear exactly the same at the output layer.

Bottleneck-type Neural Net Architecture for Image Compression

In this case, the network could be used for image compression by breaking it in two as shown in the Figure below. The transmitter encodes and then transmits the output of the hidden layer (only 16 values as compared to the 64 values of the original image).The receiver receives and decodes the 16 hidden outputs and generates the 64 outputs. Since the network is implementing an identity map, the output at the receiver is an exact reconstruction of the original image.

The Image Compression Scheme using the Trained Neural Net

Actually, even though the bottleneck takes us from 64 nodes down to 16 nodes, no real compression has occurred because unlike the 64 original inputs which are 8-bit pixel values, the outputs of the hidden layer are real-valued (between -1 and 1), which requires possibly an infinite number of bits to transmit. True image compression occurs when the hidden layer outputs are quantized before transmission. The Figure below shows a typical quantization scheme using 3 bits to encode each input. In this case, there are 8 possible binary codes which may be formed: 000, 001, 010, 011, 100, 101, 110, 111. Each of these codes represents a range of values for a hidden unit output. For example, consider the first hidden output . When the value of is between -1.0 and -0.75, then the code 000 is transmitted; when is between 0.25 and 0.5, then 101 is transmitted. To compute the amount of image compression (measured in bits-per-pixel) for this level of quantization, we compute the ratio of the total number of bits transmitted: to the total number of pixels in the original image: 64; so in this case, the compression rate is given as bits/pixel. Using 8 bit quantization of the hidden units gives a compression rate of bits/pixel.

The Quanitization of Hidden Unit Outputs

The training of the neural net proceeds as follows, a 256x256 training image is used to train the bottleneck type network to learn the required identity map. Training input-output pairs are produced from the training image by extracting small 8x8 chunks of the image chosen at a uniformly random location in the image. The easiest way to extract such a random chunk i s to generate a pair of random integers to serve as the upper left hand corner of the extracted chunk. In this case, we choose random integers i and j, each between 0 and 248, and then (i,j) is the coordinate of the upper left hand corner of the extracted chunk. The pixel values of the extracted image chunk are sent (left to right, top to bottom) through the pixel-to-real mapping shown in the Figure below to construct the 64-dimensional neural net input . Since the goal is to learn the identity map, the desired target for the constructed input is itself; hence, the training pair is used to update the weights of the network.

The Pixel-to-Real and Real-to-Pixel Conversions

Once training is complete, image compression is demonstrated in the recall phase. In this case, we still present the neural net with 8x8 chunks of the image, but now instead of randomly selecting the location of each chunk, we select the chunks in sequence from left to right and from top to bottom. For each such 8x8 chunk, the output the network can be computed and displayed on the screen to visually observe the performance of neural net image compression. In addition, the 16 outputs of the hidden layer can be grouped into a 4x4 "compressed image", which can be displayed as well.