Computer images are extremely data intensive and hence require large amounts of memory for storage. As a result, the transmission of an image from one machine to another can be very time consuming. By using data compression techniques, it is possible to remove some of the redundant information contained in images, requiring less storage space and less time to transmit. Neural nets can be used for the purpose of image compression, as shown in the following demonstration.
A neural net architecture suitable for solving the image compression
problem is shown below. This type of structure--a large input layer feeding
into a small hidden layer, which then feeds into a large output layer--
is referred to as a bottleneck type network. The idea is this:
suppose that the neural net shown below had been trained to implement the
identity map. Then, a
tiny image presented to the network as input would appear exactly the
same at the output layer.
Bottleneck-type Neural Net Architecture for Image Compression
In this case, the network could be used for
image compression by breaking it in two as shown in the Figure below. The
transmitter encodes and then transmits the output of the hidden
layer (only 16 values as compared to the 64 values of the original
image).The receiver receives and decodes the 16 hidden outputs and
generates the 64 outputs. Since the network is implementing an identity
map, the output at the receiver is an exact reconstruction of the
original image.
The Image Compression Scheme using the Trained Neural Net
Actually, even though the bottleneck takes us from 64 nodes down to 16 nodes, no real compression has occurred because unlike the 64 original inputs which are 8-bit pixel values, the outputs of the hidden layer are real-valued (between -1 and 1), which requires possibly an infinite number of bits to transmit. True image compression occurs when the hidden layer outputs are quantized before transmission.
The Figure below shows a typical quantization scheme using 3 bits to encode each input. In this case, there are 8 possible binary codes which may be formed: 000, 001, 010, 011, 100, 101, 110, 111. Each of these codes represents a range of values for a hidden unit output. For example, consider the first hidden output . When the value of is between -1.0 and -0.75, then the code 000 is transmitted; when is between 0.25 and 0.5, then 101 is transmitted. To compute the amount of image compression (measured in bits-per-pixel) for this level of quantization, we compute the ratio of the total number of bits transmitted: to the total number of pixels in the original image: 64;
so in this case, the compression rate is given as bits/pixel. Using 8 bit quantization of the hidden units gives a compression rate of bits/pixel.
The Quanitization of Hidden Unit Outputs
The training of the neural net proceeds as follows, a 256x256 training image
is used to train the bottleneck type network to learn the required identity
map. Training input-output pairs are produced from the training image by
extracting small 8x8 chunks of the image chosen at a uniformly random
location in the image. The easiest way to extract such a random chunk i
s to generate a pair of random integers to serve as the upper left hand
corner of the extracted chunk. In this case, we choose random integers
i and j, each between 0 and 248, and then (i,j)
is the
coordinate of the upper left hand
corner of the extracted chunk. The pixel values of the extracted image
chunk are sent (left to right, top to bottom) through the pixel-to-real
mapping shown in the Figure below to construct the 64-dimensional neural
net input . Since the goal is to learn the identity map, the desired
target for the constructed input is itself; hence, the training pair
is used to update the weights of the network.
The Pixel-to-Real and Real-to-Pixel Conversions
Once training is complete, image compression is demonstrated in the recall
phase. In this case, we still present the neural net with 8x8 chunks of
the image, but now instead of randomly selecting the location of each
chunk, we select the chunks in sequence from left to right and from
top to bottom. For each such 8x8 chunk, the output the network can be
computed and displayed on the screen to visually observe the
performance of neural net image compression. In addition, the 16
outputs of the hidden layer can be grouped into a 4x4 "compressed image",
which can be displayed as well.