Digital images, such as the photographs taken on your phone, are displayed as a rectangular array of pixels. For example, the photograph in Figure 4.4.1 is 1440 pixels wide and 1468 pixels high. If we were to zoom in on the photograph, we would be able to see individual pixels, such as those shown on the right.
A lot of data is required to display this image. A quantity of digital data is frequently measured in bytes, where one byte is the amount of storage needed to record an integer between 0 and 255. As we will see shortly, each pixel requires three bytes to record that pixel’s color. This means the amount of data required to display this image is \(3\by1440\by1468 = 6,341,760\) bytes or about 6.3 megabytes.
Of course, we would like to store this image on a phone or computer and perhaps transmit it through our data plan to share it with others. If possible, we would like to find a way to represent this image using a smaller amount of data so that we don’t run out of memory on our phone and quickly exhaust our data plan.
As we will see in this section, the JPEG compression algorithm provides a means for doing just that. This image, when stored in the JPEG format, requires only 467,359 bytes of data, which is about 7% of the 6.3 megabytes required to display the image. That is, when we display this image, we are reconstructing it from only 7% of the original data. This isn’t too surprising since there is quite a bit of redundancy in the image; the left half of the image is almost uniformly blue. The JPEG algorithm detects this redundancy by representing the data using bases that are well-suited to the task.
Preview Activity4.4.1.
Since we will be using various bases and the coordinate systems they define, let’s review how to translate between coordinate systems.
Suppose that we have a basis \(\bcal=\{\vvec_1,\vvec_2,\ldots,\vvec_m\}\) for \(\real^m\text{.}\) Explain what we mean by the representation \(\coords{\xvec}{\bcal}\) of a vector \(\xvec\) in the coordinate system defined by \(\bcal\text{.}\)
If we are given the representation \(\coords{\xvec}{\bcal}\text{,}\) how can we recover the vector \(\xvec\text{?}\)
If we are given the vector \(\xvec\text{,}\) how can we find \(\coords{\xvec}{\bcal}\text{?}\)
The components of the vector \(\coords{\xvec}{\bcal}\) are the weights that express \(\xvec\) as a linear combination of the basis vectors; that is, \(\coords{\xvec}{\bcal} =
\cfourvec{c_1}{c_2}{\vdots}{c_m}\) if \(\xvec=c_1\vvec_1+c_2\vvec_2+\ldots+c_m\vvec_m\text{.}\)
If we form the matrix \(P_{\bcal} =
\left[\begin{array}{rrrr}
\vvec_1 \amp \vvec_2 \amp \ldots \amp \vvec_m
\end{array}\right]\text{,}\) then \(\xvec=P_{\bcal}\coords{\xvec}{\bcal}\text{.}\)
As before, \(\coords{\xvec}{\bcal}=P_{\bcal}^{-1}\xvec\text{.}\)
We find \(\xvec=P_{\bcal}\coords{\xvec}{\bcal} =
\twovec{-1}{1}\text{.}\)
We find \(\coords{\xvec}{\bcal}=P_{\bcal}^{-1}\xvec =
\twovec{-3}{5}\text{.}\)
Subsection4.4.1Color models
A color is represented digitally by a vector in \(\real^3\text{.}\) There are different ways in which we can represent colors, however, depending on whether a computer or a human will be processing the color. We will describe two of these representations, called color models, and demonstrate how they are used in the JPEG compression algorithm.
Digital displays typically create colors by blending together various amounts of red, green, and blue. We can therefore describe a color by putting its constituent amounts of red, green, and blue into a vector \(\threevec{R}{G}{B}\text{.}\) The quantities \(R\text{,}\)\(G\text{,}\) and \(B\) are stored with one byte of information so they are integers between 0 and 255. This is called the RGB color model.
We define a basis \(\bcal=\{\vvec_1,\vvec_2,\vvec_3\}\) where
The coordinate \(Y\) is called luminance while \(C_b\) and \(C_r\) are called blue and red chrominance, respectively. In this coordinate system, luminance will vary from 0 to 255, while the chrominances vary between -127.5 and 127.5. This is known as the \(YC_bC_r\) color model, also denoted YCbCr. (To be completely accurate, we should add 127.5 to the chrominance values so that they lie between 0 and 255, but we won’t worry about that here.)
Activity4.4.2.
This activity investigates these two color models, which we view as coordinate systems for describing colors.
First, we will explore the \(RGB\) color model.
What happens when \(G=0\text{,}\)\(B=0\) (pushed all the way to the left), and \(R\) is allowed to vary?
What happens when \(R=0\text{,}\)\(G=0\text{,}\) and \(B\) is allowed to vary?
How can you create black in this color model?
How can you create white?
Next, we will explore the \(YC_bC_r\) color model.
What happens when \(C_b=0\) and \(C_r=0\) (kept in the center) and \(Y\) is allowed to vary?
What happens when \(Y=0\) (pushed to the left), \(C_r=0\) (kept in the center), and \(C_b\) is allowed to increase between 0 and 127.5?
What happens when \(Y=0\text{,}\)\(C_b=0\text{,}\) and \(C_r\) is allowed to increase between 0 and 127.5?
How can you create black in this color model?
How can you create white?
Verify that \(\bcal\) is a basis for \(\real^3\text{.}\)
Find the matrix \(P_{\bcal}\) that converts from \(\threevec{Y}{C_b}{C_r}\) coordinates into \(\threevec{R}{G}{B}\) coordinates. Then find the matrix \(P_{\bcal}^{-1}\) that converts from \(\threevec{R}{G}{B}\) coordinates back into \(\threevec{Y}{C_b}{C_r}\) coordinates.
Find the \(\threevec{Y}{C_b}{C_r}\) coordinates for the following colors and check, using the diagrams above, that the two representations agree.
Pure red is \(\threevec{R}{G}{B}=
\threevec{255}{0}{0}\text{.}\)
Pure blue is \(\threevec{R}{G}{B}=
\threevec{0}{0}{255}\text{.}\)
Pure white is \(\threevec{R}{G}{B}=
\threevec{255}{255}{255}\text{.}\)
Pure black is \(\threevec{R}{G}{B}=
\threevec{0}{0}{0}\text{.}\)
Find the \(\threevec{R}{G}{B}\) coordinates for the following colors and check, using the diagrams above, that the two representations agree.
Working with the \(RGB\) color model, we find that
we produce red with varying degrees of brightness.
we produce blue with varying degrees of brightness.
\(R=G=B=0\text{.}\)
\(R=G=B=255\text{.}\)
Working with the \(YC_bC_r\) color model, we find that
we produce gray with varying degrees of brightness.
we produce blue with varying degrees of brightness.
we produce red with varying degrees of brightness.
\(Y=C_b=C_r = 0\text{.}\)
\(Y=255\) with \(C_b=C_r=0\text{.}\)
If we row reduce the matrix whose columns are the vectors in \(\bcal\text{,}\) we obtain the identity matrix, which means that the vectors are linearly independent and span \(\real^3\text{.}\)
The expression for \(Y\) is a weighted average of the \(R\text{,}\)\(G\text{,}\) and \(B\) values. The expression for \(C_b\) takes half the amount of \(B\) and subtracts the amounts of red and green. Likewise, the expression for \(C_r\) takes half the amount of \(R\) and subtracts the amounts of green and blue.
These two color models provide us with two ways to represent colors, each of which is useful in a certain context. Digital displays, such as those in phones and computer monitors, create colors by combining various amounts of red, green, and blue. The \(RGB\) model is therefore most relevant in digital applications.
By contrast, the \(YC_bC_r\) color model was created based on research into human vision and aims to concentrate the most visually important data into a single coordinate, the luminance, to which our eyes are most sensitive. Of course, any basis of \(\real^3\) must have three vectors so we need two more coordinates, blue and red chrominance, if we want to represent all colors.
To see this explicitly, shown in Figure 4.4.4 is the original image and the image as rendered with only the luminance. That is, on the right, the color of each pixel is represented by only one byte, which is the luminance. This image essentially looks like a grayscale version of the original image with all its visual detail. In fact, before digital television became the standard, television signals were broadcast using the \(YC_bC_r\) color model. When a signal was displayed on a black-and-white television, the luminance was displayed and the two chrominance values simply ignored.
For comparison, shown in Figure 4.4.5 are the corresponding images created using only the blue chrominance and the red chrominance. Notice that the amount of visual detail is considerably less in these images.
The aim of the JPEG compression algorithm is to represent an image using the smallest amount of data possible. By converting from the \(RGB\) color model to the \(YC_bC_r\) color model, we are concentrating the most visually important data into the luminance values. This is helpful because we can safely ignore some of the data in the chrominance values since that data is not as visually important.
Subsection4.4.2The JPEG compression algorithm
The key to representing the image using a smaller amount of data is to detect redundancies in the data. To begin, we first break the image, which is composed of \(1440\by 1468\) pixels, into small \(8\by8\) blocks of pixels. For example, we will consider the \(8\by8\) block of pixels outlined in green in the original image, shown on the left of Figure 4.4.6. The image on the right zooms in on the block.
Notice that this block, as seen in the original image, is very small. If we were to change some of the colors in this block slightly, our eyes would probably not notice.
Following our earlier work, we will change the representation of colors from the \(RGB\) color model to the \(YC_bC_r\) model. This separates the colors into luminance and chrominance values that we will consider separately. In Figure 4.4.8, we see the luminance values of this block. Again, notice how these values do not vary significantly over the block.
Our strategy in the compression algorithm is to perform a change of basis to take advantage of the fact that the luminance values do not change significantly over the block. Rather than recording the luminance of each of the pixels, this change of basis will allow us to record the average luminance along with some information about how the individual colors vary from the average.
Let’s look at the first column of luminance values, which is a vector in \(\real^8\text{:}\)
On first glance, this probably looks intimidating, but we can make sense of it by looking at these vectors graphically. Shown in Figure 4.4.9 are four of these basis vectors. Notice that \(\vvec_0\) is constantly 1, \(\vvec_1\) varies relatively slowly, \(\vvec_2\) varies a little more rapidly, and \(\vvec_7\) varies quite rapidly. The main thing to notice is that the basis vectors vary at different rates with the first vectors varying relatively slowly and the later vectors varying more rapidly.
These vectors form the basis \(\bcal\) for \(\real^8\text{.}\) Remember that \(\xvec\) is the vector of luminance values in the first column as seen on the right. We will write \(\xvec\) in the new coordinates
The coordinates \(F_j\) are called the Fourier coefficients of the vector \(\xvec\text{.}\)
Activity4.4.3.
We will explore the influence that the Fourier coefficients have on the vector \(\xvec\text{.}\)
To begin, we’ll look at the Fourier coefficient \(F_0\text{.}\)
Describe the effect that \(F_0\) has on the vector \(\xvec\text{.}\) Would you describe the components in \(\xvec\) as constant, slowly varying, or rapidly varying?
By comparison, let’s see how the Fourier coefficient \(F_3\) influences \(\xvec\text{.}\)
Describe the effect that \(F_3\) has on the vector \(\xvec\text{.}\) Would you describe the components in \(\xvec\) as constant, slowly varying, or rapidly varying?
Let’s now investigate how the Fourier coefficient \(F_7\) influences the vector \(\xvec\text{.}\)
Describe the effect that \(F_7\) has on the vector \(\xvec\text{.}\) Would you describe the components in \(\xvec\) as constant, slowly varying, or rapidly varying?
If the components of \(\xvec\) vary relatively slowly, what would you expect to be true of the Fourier coefficients \(F_j\text{?}\)
The Python cell below will construct the vector \(P_{\bcal}\text{,}\) which is denoted P, and its inverse \(P_{\bcal}^{-1}\text{,}\) which is denoted Pinv. Evaluate this Python cell and notice that it prints the matrix \(P_{\bcal}^{-1}\text{.}\)
Now look at the form of \(P_{\bcal}^{-1}\) and explain why \(F_0\) is the average of the luminance values in the vector \(\xvec\text{.}\)
The Python cell below defines the vector \(\xvec\text{,}\) which is the vector of luminance values in the first column, as seen in Figure 4.4.8. Use the cell below to find the vector \(\fvec\) of Fourier coefficients \(F_0,F_1,\ldots,F_7\text{.}\) If you have evaluated the cell above, you will still be able to refer to P and Pinv in this cell.
Write the Fourier coefficients and discuss the relative sizes of the coefficients.
Let’s see what happens when we simply ignore the coefficients \(F_6\) and \(F_7\text{.}\) Form a new vector of Fourier coefficients by rounding the coefficients to the nearest integer and setting \(F_6\) and \(F_7\) to zero. This is an approximation to \(\fvec\text{,}\) the vector of Fourier coefficients. Use the approximation to \(\fvec\) to form an approximation of the vector \(\xvec\text{.}\)
How much does your approximation differ from the actual vector \(\xvec\text{?}\)
When we ignore the Fourier coefficients corresponding to rapidly varying basis elements, we see that the vector \(\xvec\) that we reconstruct is very close to the original one. In fact, the luminance values in the approximation differ by at most one or two from the actual luminance values. Our eyes are not sensitive enough to detect this difference.
So far, we have concentrated on only one column in our \(8\by8\) block of luminance values. Let’s now consider all of the columns. The following Python cell defines a matrix called luminance, which is the \(8\by8\) matrix of luminance values. Find the \(8\by8\) matrix \(F\) whose columns are the Fourier coefficients of the columns of luminance values.
Notice that the first row of this matrix consists of the Fourier coefficient \(F_0\) for each of the columns. Just as we saw before, the entries in this row do not change significantly as we move across the row. In the Sage cell below, write these entries in the vector \(\yvec\) and find the corresponding Fourier coefficients.
Notice that the largest Fourier coefficient is \(F_0=151.6\) and that the coefficients \(F_4\text{,}\)\(F_5\text{,}\)\(F_6\text{,}\) and \(F_7\) are relatively small. This reflects the fact that the luminance does not vary rapidly.
We see that the Fourier coefficients of these Fourier coefficients is \(\left[\begin{array}{c}
161 \\ -3.8 \\ -4.7 \\ -1.6 \\ 0 \\ 0.9 \\ 0.8 \\ 0.4
\end{array}\right]
\text{.}\) Once again, we see that \(F_0\) is the largest Fourier coefficient and the coefficients corresponding to rapid variations are small.
Up to this point, we have been working with the luminance values in one \(8\by8\) block of our image. We formed the Fourier coefficients for each of the columns of this block. Once we notice that the Fourier coefficients across a row are relatively constant, it seems reasonable to find the Fourier coefficients of the rows of the matrix of Fourier coefficients. Doing so leads to the matrix
If we were to look inside a JPEG image file, we would see lots of matrices like this. For each \(8\by 8\) block, there would be three matrices of Fourier coefficients of the rows of Fourier coefficients, one matrix for each of the luminance, blue chrominance, and red chrominance values. However, we store these Fourier coefficients as integers inside the JPEG file so we need to round off the coefficients to the nearest integer, as shown here:
There are many zeroes in this matrix, and we can save space in a JPEG image file by only recording the nonzero Fourier coefficients.
In fact, when a JPEG file is created, there is a “quality” parameter that can be set, such as that shown in Figure 4.4.13. When the quality parameter is high, we will store many of the Fourier coefficients; when it is low, we will ignore more of them.
To see how this works, suppose the quality setting is relatively high. After rounding off the Fourier coefficients, we will set all of the coefficients whose absolute value is less than 2 to zero, which creates the matrix:
Notice that there are 12 nonzero Fourier coefficients, out of 64, that we need to record. Consequently, we only save \(12/64 \approx
19\%\) of the data.
If instead, the quality setting is relatively low, we set all of the Fourier coefficients whose absolute value is less than 4 to zero, creating the matrix:
Notice that there are only 5 nonzero Fourier coefficients that we need to record now, meaning we save only \(5/64\approx 8\%\) of the data. This will result in a smaller JPEG file describing the image.
With a lower quality setting, we have thrown away more information about the Fourier coefficients so the image will not be reconstructed as accurately. To see this, we can reconstruct the luminance values from the Fourier coefficients by converting back into the standard coordinate system. Rather than showing the luminance values themselves, we will show the difference in the original luminance values and the reconstructed luminance values. When the quality setting was high and we stored 12 Fourier coefficients, we find this difference to be
This demonstrates the trade off. With a high quality setting, we require more storage to save more of the data, but the reconstructed image is closer to the original. With the lower quality setting, we require less storage, but the reconstructed image differs more from the original.
If we remember that the visual information stored by the blue and red chrominance values is not as important as that contained in the luminance values, we feel safer in discarding more of the Fourier coefficients for the chrominance values resulting in an even greater savings.
Shown in Figure 4.4.14 is the original image compared to a version stored with a very low quality setting. If you look carefully, you can individual \(8\by8\) blocks.
This discussion of the JPEG compression algorithm is meant to explore the ideas that underlie its construction and demonstrate the importance of a choice of basis and its accompanying coordinate system. There are a few details, most notably about the rounding of the Fourier coefficients, that are not strictly accurate. The actual implementation is a little more complicated, but the presentation here conveys the spirit of the algorithm.
The JPEG compression algorithm allows us to store image files using only a fraction of the data. Similar ideas are used to efficiently store digital music and video files.
Subsection4.4.3Summary
This section has explored how appropriate changes in bases help us reconstruct an image using only a fraction of its data. This is known as image compression.
There are several ways of representing colors, all of which use vectors in \(\real^3\text{.}\) We explored the \(RGB\) color model, which is appropriate in digital applications, and the \(YC_bC_r\) model, in which the most important visual information is conveyed by the \(Y\) component, known as luminance.
We also explored a change of basis called the Discrete Fourier Transform. In the coordinate system that results, the first coefficient measures the average of the components of a vector. Other coefficients measure variations in the components away from the average.
We put both of these ideas to use in demonstrating the JPEG compression algorithm. An image is broken into \(8\by8\) blocks, and the colors into luminance, blue chrominance, and red chrominance. Applying the Discrete Fourier Transform allows us to reconstruct a good approximation of the image using only a fraction of the original data.
In the Sage cell below is a copy of the change of basis matrices that define the Fourier transform. Find the Fourier coefficients of \(\xvec\text{.}\)
We will now form the vector \(\yvec\text{,}\) which is an approximation of \(\xvec\) by rounding all the Fourier coefficients of \(\xvec\) to the nearest integer to obtain \(\coords{\yvec}{\ccal}\text{.}\) Now find the vector \(\yvec\) and compare this approximation to \(\xvec\text{.}\) What is the error in this approximation?
Repeat the last part of this problem, but set the rounded Fourier coefficients to zero if they have an absolute value less than five. Use it to create a second approximation of \(\xvec\text{.}\) What is the error in this approximation?
Compare the number of nonzero Fourier coefficients that you have in the two approximations and compare the accuracy of the approximations. Using a few sentences, discuss the comparisons that you find.
2.
There are several steps to the JPEG compression algorithm. The following questions examine the motivation behind some of them.
What is the overall goal of the JPEG compression algorithm?
Why do we convert colors from the the \(RGB\) color model to the \(YC_bC_r\) model?
Why do we decompose the image into a collection of \(8\by8\) arrays of pixels?
What role does the Discrete Fourier Transform play in the JPEG compression algorithm?
Why is the information conveyed by the rapid-variation Fourier coefficients, generally speaking, less important than the slow-variation coefficients?
3.
The Fourier transform that we used in this section is often called the Discrete Fourier Cosine Transform because it is defined using a basis \(\ccal\) consisting of cosine functions. There is also a Fourier Sine Transform defined using a basis \(\scal\) consisting of sine functions. For instance, in \(\real^4\text{,}\) the basis vectors of \(\scal\) are
We can think of these vectors graphically, as shown in Figure 4.4.15.
The Sage cell below defines the matrix S whose columns are the vectors in the basis \(\scal\) as well as the matrix C whose columns form the basis \(\ccal\) used in the Fourier Cosine Transform.
In the \(8\by8\) block of luminance values we considered in this section, the first column begins with the four entries 176, 181, 165, and 139, as seen in Figure 4.4.8. These form the vector \(\xvec=\fourvec{176}{181}{165}{139}\text{.}\) Find both \(\coords{\xvec}{\scal}\) and \(\coords{\xvec}{\ccal}\text{.}\)
Write a sentence or two comparing the values for the Fourier Sine coefficients \(\coords{\xvec}{\scal}\) and the Fourier Cosine coefficients \(\coords{\xvec}{\ccal}\text{.}\)
Suppose now that \(\xvec=\fourvec{100}{100}{100}{100}\text{.}\) Find the Fourier Sine coefficients \(\coords{\xvec}{\scal}\) and the Fourier Cosine coefficients \(\coords{\xvec}{\ccal}\text{.}\)
Write a few sentences explaining why we use the Fourier Cosine Transform in the JPEG compression algorithm rather than the Fourier Sine Transform.
4.
In Example 4.3.9, we looked at a basis for \(\real^4\) that we called the Haar wavelet basis. The basis vectors are
The following Sage cell defines the matrix W whose columns are the basis vectors in \(\wcal\text{.}\) If \(\xvec\) is the first column of luminance values in the \(4\by4\) block above, find the wavelet coefficients \(\coords{\xvec}{\wcal}\text{.}\)
Notice that \(H_1\) gives the average value of the components of \(\xvec\) and \(H_2\) describes how the averages of the first two and last two components differ from the overall average. The coefficients \(H_3\) and \(H_4\) describe small-scale variations between the first two components and last two components, respectively.
If we set the last wavelet coefficients \(H_3=0\) and \(H_4=0\text{,}\) we obtain the wavelet coefficients \(\coords{\yvec}{\wcal}\) for a vector \(\yvec\) that approximates \(\xvec\text{.}\) Find the vector \(\yvec\) and compare it to the original vector \(\xvec\text{.}\)
What impact does the fact that \(H_3=0\) and \(H_4=0\) have on the form of the vector \(\yvec\text{?}\) Explain how setting these coefficients to zero ignores the behavior of \(\xvec\) on a small scale.
In the JPEG compression algorithm, we looked at the Fourier coefficients of all the columns of luminance values and then performed a Fourier transform on the rows. The Sage cell below will perform the same operation using the wavelet transform; that is, it will first find the wavelet coefficients of each of the columns and then perform the wavelet transform on the rows. You only need to evaluate the cell to find the wavelet coefficients obtained in this way.
Now set all the wavelet coefficients equal to zero except those in the upper left \(2\by2\) block and use them to define the matrix coeffs in the Sage cell below. This has the effect of ignoring all of the small-scale differences. Evaluating this cell will recover the approximate luminance values.
Explain how the wavelet transform and this approximation can be used to create a lower resolution version of the image.
This kind of wavelet transform is the basis of the JPEG 2000 compression algorithm, which is an alternative to the usual JPEG algorithm.
5.
In this section, we looked at the \(RGB\) and \(YC_bC_r\) color models. In this exercise, we will look at the \(HSV\) color model where \(H\) is the hue, \(S\) is the saturation, and \(V\) is the value of the color. All three quantities vary between 0 and 255.
If you leave \(S\) and \(V\) at some fixed values, what happens when you change the value of \(H\text{?}\)
Increase the value \(V\) to 255, which is on the far right. Describe what happens when you vary the saturation \(S\) using a fixed hue \(H\) and value \(V\text{.}\)
Describe what happens when \(H\) and \(S\) are fixed and \(V\) varies.
How can you create white in this color model?
How can you create black in this color model?
Find an approximate range of hues that correspond to blue.
Find an approximate range of hues that correspond to green.
The \(YC_bC_r\) color model concentrates the most important visual information in the luminance coordinate, which roughly measures the brightness of the color. The other two coordinates describe the hue of the color. By contrast, the \(HSV\) color model concentrates all the information about the hue in the \(H\) coordinate.
This is useful in computer vision applications. For instance, if we want a robot to detect a blue ball in its field of vision, we can specify a range of hue values to search for. If the lighting changes in the room, the saturation and value may change, but the hue will not. This increases the likelihood that the robot will still detect the blue ball across a wide range of lighting conditions.