Finding orthogonal bases

Section 6.4 Finding orthogonal bases

The last section demonstrated the value of working with orthogonal, and especially orthonormal, sets. If we have an orthogonal basis \(\wvec_1,\wvec_2,\ldots,\wvec_n\) for a subspace \(W\text{,}\) the Projection Formula 6.3.14 tells us that the orthogonal projection of a vector \(\bvec\) onto \(W\) is

\begin{align*} \bhat \amp = \proj{\bvec}{W}\\ \amp\\ \amp = \proj{\bvec}{\wvec_1} + \proj{\bvec}{\wvec_2} + \cdots + \proj{\bvec}{\wvec_n} \\ \amp\\ \amp = \frac{\bvec\cdot\wvec_1}{\wvec_1\cdot\wvec_1}\wvec_1 + \frac{\bvec\cdot\wvec_2}{\wvec_2\cdot\wvec_2}\wvec_2 + \cdots + \frac{\bvec\cdot\wvec_n}{\wvec_n\cdot\wvec_n}\wvec_n\text{.} \end{align*}

An orthonormal basis \(\uvec_1,\uvec_2,\ldots,\uvec_n\) is even more convenient: after forming the matrix \(Q=\begin{bmatrix} \uvec_1 \amp \uvec_2 \amp \ldots \amp \uvec_n \end{bmatrix}\text{,}\) we have

\begin{equation*} \bhat = QQ^{\transpose}\bvec\text{.} \end{equation*}

In the examples we’ve seen so far, however, orthogonal bases were given to us. What we need now is a way to form orthogonal bases. In this section, we’ll explore an algorithm that begins with any basis for a subspace and creates an orthogonal basis. Once we have an orthogonal basis, we can scale each of the vectors appropriately to produce an orthonormal basis.

Preview Activity 6.4.1.

Suppose we have a basis for \(\real^2\) consisting of the vectors

\begin{equation*} \vvec_1=\twovec11,\hspace{24pt} \vvec_2=\twovec02 \end{equation*}

as shown in Figure 6.4.1. Notice that this basis is not orthogonal.

Figure 6.4.1. A basis for \(\real^2\text{.}\)

Find the vector \(\vhat_2 = \proj{\vvec_2}{\vvec_1}\text{.}\)
Explain why \(\vvec_2 - \vhat_2\) is orthogonal to \(\vvec_1\text{.}\)
Define the new vectors \(\wvec_1=\vvec_1\) and \(\wvec_2=\vvec_2-\vhat_2\) and sketch them in Figure 6.4.2. Explain why \(\wvec_1\) and \(\wvec_2\) define an orthogonal basis for \(\real^2\text{.}\)

Figure 6.4.2. Sketch the new basis \(\wvec_1\) and \(\wvec_2\text{.}\)
Write the vector \(\bvec=\twovec8{-10}\) as a linear combination of \(\wvec_1\) and \(\wvec_2\text{.}\)
Scale the vectors \(\wvec_1\) and \(\wvec_2\) to produce an orthonormal basis \(\uvec_1\) and \(\uvec_2\) for \(\real^2\text{.}\)

Solution.

\(\vhat_2 = \vvec_1\text{.}\)
The orthogonal projection \(\vhat_2\) is defined so that \(\vvec_2-\vhat_2\) is orthogonal to \(\vvec_1\text{.}\)
\(\wvec_1 = \twovec11\) and \(\wvec_2=\twovec{-1}1\) are othogonal, which can be checked with the dot product.
The projection formula gives \(\bvec = -\wvec_1 -9\wvec_2\text{.}\)
\(\uvec_1=\frac1{\sqrt{2}}\twovec11\) and \(\uvec_2=\frac1{\sqrt{2}}\twovec{-1}1\text{.}\)

Subsection 6.4.1 Gram-Schmidt orthogonalization

The preview activity illustrates the main idea behind an algorithm, known as Gram-Schmidt orthogonalization, that begins with a basis for some subspace of \(\real^m\) and produces an orthogonal or orthonormal basis. The algorithm relies on our construction of the orthogonal projection. Remember that we formed the orthogonal projection \(\bhat\) of \(\bvec\) onto a subspace \(W\) by requiring that \(\bvec-\bhat\) is orthogonal to \(W\) as shown in Figure 6.4.3.

Figure 6.4.3. If \(\bhat\) is the orthogonal projection of \(\bvec\) onto \(W\text{,}\) then \(\bvec-\bhat\) is orthogonal to \(W\text{.}\)

This observation guides our construction of an orthogonal basis for it allows us to create a vector that is orthogonal to a given subspace. Let’s see how the Gram-Schmidt algorithm works.

Activity 6.4.2.

Suppose that \(W\) is a three-dimensional subspace of \(\real^4\) with basis:

\begin{equation*} \vvec_1 = \fourvec1111,\hspace{24pt} \vvec_2 = \fourvec1322,\hspace{24pt} \vvec_3 = \fourvec1{-3}{-3}{-3}\text{.} \end{equation*}

We can see that this basis is not orthogonal by noting that \(\vvec_1\cdot\vvec_2 = 8\text{.}\) Our goal is to create an orthogonal basis \(\wvec_1\text{,}\) \(\wvec_2\text{,}\) and \(\wvec_3\) for \(W\text{.}\)

To begin, we declare that \(\wvec_1=\vvec_1\text{,}\) and we call \(W_1\) the line defined by \(\wvec_1\text{:}\)

\begin{equation*} \wvec_2 = \vvec_2 - \proj{\vvec_2}{\wvec_1}\text{.} \end{equation*}

Find the vector \(\vhat_2 = \proj{\vvec_2}{\vvec_1}\text{.}\)
Form the vector \(\wvec_2 = \vvec_2-\vhat_2\) and verify that it is orthogonal to \(\wvec_1\text{.}\)
Explain why \(\laspan{\wvec_1,\wvec_2} = \laspan{\vvec_1,\vvec_2}\) by showing that any linear combination of \(\vvec_1\) and \(\vvec_2\) can be written as a linear combination of \(\wvec_1\) and \(\wvec_2\) and vice versa.
The vectors \(\wvec_1\) and \(\wvec_2\) are an orthogonal basis for a two-dimensional subspace \(W_2\) of \(\real^4\text{.}\) Find the vector \(\vhat_3\) that is the orthogonal projection of \(\vvec_3\) onto \(W_2\text{.}\)
Verify that \(\wvec_3 = \vvec_3-\vhat_3\) is orthogonal to both \(\wvec_1\) and \(\wvec_2\text{.}\)
Explain why \(\wvec_1\text{,}\) \(\wvec_2\text{,}\) and \(\wvec_3\) form an orthogonal basis for \(W\text{.}\)
Now find an orthonormal basis for \(W\text{.}\)

Answer.

\(\widehat{\vvec}_2 = \fourvec2222\text{.}\)
\(\displaystyle \wvec_2 = \fourvec{-1}100\)
We have \(\vvec_1 = \wvec_1\) and \(\vvec_2 = \wvec_2 + 2\wvec_1\) so a linear combination of \(\vvec_1\) and \(\vvec_1\) can be rewritten as a linear combination of \(\wvec_1\) and \(\wvec_2\text{.}\)
\(\widehat{\vvec}_3 = \fourvec0{-4}{-2}{-2}\text{.}\)
\(\displaystyle \wvec_3 = \fourvec11{-1}{-1}\)
Since \(\vvec_i\) can be written in terms of \(\wvec_j\) and vice-versa, these new vectors form a basis for \(W\text{.}\)
\begin{equation*} \uvec_1=\fourvec{1/2}{1/2}{1/2}{1/2},\hspace{24pt} \uvec_2=\fourvec{-1/\sqrt{2}}{1/\sqrt{2}}00,\hspace{24pt} \uvec_3=\fourvec{1/2}{1/2}{-1/2}{-1/2} \end{equation*}

Solution.

\(\widehat{\vvec}_2 = \frac{\vvec_2\cdot\wvec_1} {\wvec_1\cdot\wvec_1}\wvec_1=\fourvec2222\text{.}\)
\(\displaystyle \wvec_2 = \fourvec{-1}100\)
We have \(\vvec_1 = \wvec_1\) and \(\vvec_2 = \wvec_2 + 2\wvec_1\text{.}\) Therefore, a linear combination of \(\vvec_1\) and \(\vvec_1\) can be rewritten as

\begin{equation*} c_1\vvec_1+c_2\vvec_2 = c_1\wvec_1+c_2(\wvec_2+2\wvec_1) = (c_1+2c_2)\wvec_1 + c_2\wvec_2\text{.} \end{equation*}

In the same way, \(\wvec_1=\vvec_1\) and \(\wvec_2 = \vvec_2-2\vvec_1\) so any linear combination of \(\wvec_1\) and \(\wvec_2\) can be rewritten as a linear combination of \(\vvec_1\) and \(\vvec_1\text{.}\)
By the Projection Formula, \(\widehat{\vvec}_3 = \fourvec0{-4}{-2}{-2}\text{.}\)
\(\displaystyle \wvec_3 = \fourvec11{-1}{-1}\)
We can check that \(\wvec_1\text{,}\) \(\wvec_2\text{,}\) and \(\wvec_3\) form an orthogonal set. Since \(\vvec_i\) can be written in terms of \(\wvec_j\) and vice-versa, these new vectors form a basis for \(W\text{.}\)
\begin{equation*} \uvec_1=\fourvec{1/2}{1/2}{1/2}{1/2},\hspace{24pt} \uvec_2=\fourvec{-1/\sqrt{2}}{1/\sqrt{2}}00,\hspace{24pt} \uvec_3=\fourvec{1/2}{1/2}{-1/2}{-1/2} \end{equation*}

As this activity illustrates, Gram-Schmidt orthogonalization begins with a basis \(\vvec_1, \vvec_2,\ldots,\vvec_n\) for a subspace \(W\) of \(\real^m\) and creates an orthogonal basis for \(W\text{.}\) Let’s work through a second example.

Example 6.4.4.

Let’s start with the basis

\begin{equation*} \vvec_1=\threevec{2}{-1}2,\hspace{24pt} \vvec_2=\threevec{-3}{3}0,\hspace{24pt} \vvec_3=\threevec{-2}71\text{,} \end{equation*}

which is a basis for \(\real^3\text{.}\)

To get started, we’ll simply set \(\wvec_1=\vvec_1=\threevec{2}{-1}2\text{.}\) We construct \(\wvec_2\) from \(\vvec_2\) by subtracting its orthogonal projection onto \(W_1\text{,}\) the line defined by \(\wvec_1\text{:}\)

\begin{equation*} \wvec_2 = \vvec_2 - \proj{\vvec_2}{\wvec_1} \end{equation*}

This gives

\begin{equation*} \wvec_2 = \vvec_2 - \frac{\vvec_2\cdot\wvec_1}{\wvec_1\cdot\wvec_1}\wvec_1 = \vvec_2 + \wvec_1 = \threevec{-1}22\text{.} \end{equation*}

Notice that we found \(\vvec_2 = -\wvec_1 + \wvec_2\text{.}\) Therefore, we can rewrite any linear combination of \(\vvec_1\) and \(\vvec_2\) as

\begin{equation*} c_1\vvec_1 + c_2\vvec_2 = c_1\wvec_1 + c_2(-\wvec_1+\wvec_2) = (c_1-c_2)\wvec_1 + c_2\wvec_2\text{,} \end{equation*}

a linear combination of \(\wvec_1\) and \(\wvec_2\text{.}\) This tells us that

\begin{equation*} W_2 = \laspan{\wvec_1,\wvec_2} = \laspan{\vvec_1,\vvec_2}\text{.} \end{equation*}

In other words, \(\wvec_1\) and \(\wvec_2\) is a orthogonal basis for \(W_2\text{,}\) the 2-dimensional subspace that is the span of \(\vvec_1\) and \(\vvec_2\text{.}\)

Finally, we form \(\wvec_3\) from \(\vvec_3\) by subtracting its orthogonal projection onto \(W_2\text{:}\)

\begin{equation*} \wvec_3 = \vvec_3 - \frac{\vvec_3\cdot\wvec_1}{\wvec_1\cdot\wvec_1}\wvec_1 - \frac{\vvec_3\cdot\wvec_2}{\wvec_2\cdot\wvec_2}\wvec_1 = \vvec_3 + \wvec_1 - 2\wvec_2 = \threevec22{-1}\text{.} \end{equation*}

We can now check that

\begin{equation*} \wvec_1=\threevec2{-1}2,\hspace{24pt} \wvec_2=\threevec{-1}22,\hspace{24pt} \wvec_3=\threevec22{-1},\hspace{24pt} \end{equation*}

is an orthogonal set. Furthermore, we have, as before, \(\laspan{\wvec_1,\wvec_2,\wvec_3} = \laspan{\vvec_1,\vvec_2,\vvec_3}\text{,}\) which says that we have found a new orthogonal basis for \(\real^3\text{.}\)

To create an orthonormal basis, we form unit vectors parallel to each of the vectors in the orthogonal basis:

\begin{equation*} \uvec_1 = \threevec{2/3}{-1/3}{2/3},\hspace{24pt} \uvec_2 = \threevec{-1/3}{2/3}{2/3},\hspace{24pt} \uvec_3 = \threevec{2/3}{2/3}{-1/3}\text{.} \end{equation*}

More generally, if we have a basis \(\vvec_1,\vvec_2,\ldots,\vvec_n\) for a subspace \(W\) of \(\real^m\text{,}\) the Gram-Schmidt algorithm creates an orthogonal basis for \(W\) in the following way:

\begin{align*} \wvec_1 \amp = \vvec_1\\ \wvec_2 \amp = \vvec_2 - \frac{\vvec_2\cdot\wvec_1}{\wvec_1\cdot\wvec_1}\wvec_1\\ \wvec_3 \amp = \vvec_3 - \frac{\vvec_3\cdot\wvec_1}{\wvec_1\cdot\wvec_1}\wvec_1 - \frac{\vvec_3\cdot\wvec_2}{\wvec_2\cdot\wvec_2}\wvec_2\\ \amp \vdots\\ \wvec_n \amp = \vvec_n - \frac{\vvec_n\cdot\wvec_1}{\wvec_1\cdot\wvec_1}\wvec_1 - \frac{\vvec_n\cdot\wvec_2}{\wvec_2\cdot\wvec_2}\wvec_2 - \ldots - \frac{\vvec_n\cdot\wvec_{n-1}} {\wvec_{n-1}\cdot\wvec_{n-1}}\wvec_{n-1}\text{.} \end{align*}

From here, we may form an orthonormal basis by constructing a unit vector parallel to each vector in the orthogonal basis: \(\uvec_j = 1/\len{\wvec_j}~\wvec_j\text{.}\)

Activity 6.4.3.

Python can automate these computations for us, and we’ll learn a standard numpy way of approaching this shortly. But first let’s see how we can code up the Gram-Schmidt process directly. The code below allows vectors to be given as 1-d arrays or as 1-column matrices. flatten() is used to convert either to the 1-d representation. The basis is provided as a python list of vectors (in either representation).

Let’s now consider \(W\text{,}\) the subspace of \(\real^5\) having basis

\begin{equation*} \vvec_1 = \fivevec{14}{-6}{8}2{-6},\hspace{24pt} \vvec_2 = \fivevec{5}{-3}{4}3{-7},\hspace{24pt} \vvec_3 = \fivevec{2}30{-2}1. \end{equation*}

Apply the Gram-Schmidt algorithm to find an orthogonal basis \(\wvec_1\text{,}\) \(\wvec_2\text{,}\) and \(\wvec_3\) for \(W\text{.}\)
Find \(\bhat\text{,}\) the orthogonal projection of \(\bvec = \fivevec{-5}{11}0{-1}5\) onto \(W\text{.}\)
Explain why we know that \(\bhat\) is a linear combination of the original vectors \(\vvec_1\text{,}\) \(\vvec_2\text{,}\) and \(\vvec_3\) and then find weights so that

\begin{equation*} \bhat = c_1\vvec_1 + c_2\vvec_2 + c_3\vvec_3. \end{equation*}
Find an orthonormal basis \(\uvec_1\text{,}\) \(\uvec_2\text{,}\) for \(\uvec_3\) for \(W\) and form the matrix \(Q\) whose columns are these vectors.
Find the product \(Q^{\transpose}Q\) and explain the result.
Find the matrix \(P\) that projects vectors orthogonally onto \(W\) and verify that \(P\bvec\) gives \(\bhat\text{,}\) the orthogonal projection that you found earlier.

Answer.

\begin{equation*} \wvec_1=\fivevec{14}{-6}82{-6},\hspace{24pt} \wvec_2=\fivevec{-2}{0}02{-4},\hspace{24pt} \wvec_3=\fivevec130{-1}{-1} \end{equation*}
\(\displaystyle \bhat=\fivevec{-4}9{-4}{-4}3\)
\(\bhat=-\frac34\vvec_1+\frac12\vvec_2+2\vvec_3\text{.}\)
\begin{equation*} \uvec_1=\fivevec{14/\sqrt{336}}{-6\sqrt{336}}{8/\sqrt{336}} {2/\sqrt{336}}{-6/\sqrt{336}},\hspace{24pt} \uvec_2=\fivevec{-2/\sqrt{24}}{0}0{2/\sqrt{24}} {-4/\sqrt{24}},\hspace{24pt} \uvec_3=\fivevec{1/\sqrt{12}}{3/\sqrt{12}}0 {-1/\sqrt{12}}{-1/\sqrt{12}} \end{equation*}
\(\displaystyle Q^{\transpose}Q=I\)
\(\displaystyle P = QQ^{\transpose} = \begin{bmatrix} 5/6 \amp 0 \amp 1/3 \amp -1/6 \amp 0 \\ 0 \amp 6/7 \amp -1/7 \amp -2/7 \amp -1/7 \\ 1/3 \amp -1/7 \amp 4/21 \amp 1/21 \amp -1/7 \\ -1/6 \amp -2/7 \amp 1/21 \amp 11/42 \amp -2/7 \\ 0 \amp -1/7 \amp -1/7 \amp -2/7 \amp 6/7 \end{bmatrix}\)

Solution.

\begin{equation*} \wvec_1=\fivevec{14}{-6}82{-6},\hspace{24pt} \wvec_2=\fivevec{-2}{0}02{-4},\hspace{24pt} \wvec_3=\fivevec130{-1}{-1} \end{equation*}
\(\displaystyle \bhat=\fivevec{-4}9{-4}{-4}3\)
We know that \(\bhat\) is in \(W\) and \(\vvec_1\text{,}\) \(\vvec_2\text{,}\) and \(\vvec_3\) is a basis for \(W\text{.}\) We find \(\bhat=-\frac34\vvec_1+\frac12\vvec_2+2\vvec_3\text{.}\)
\begin{equation*} \uvec_1=\fivevec{14/\sqrt{336}}{-6\sqrt{336}}{8/\sqrt{336}} {2/\sqrt{336}}{-6/\sqrt{336}},\hspace{24pt} \uvec_2=\fivevec{-2/\sqrt{24}}{0}0{2/\sqrt{24}}{-4/\sqrt{24}},\hspace{24pt} \uvec_3=\fivevec{1/\sqrt{12}}{3/\sqrt{12}}0{-1/\sqrt{12}}{-1/\sqrt{12}} \end{equation*}
\(Q^{\transpose}Q=I\) since the matrix product computes the dot products of the columns of \(Q\text{.}\)
\(\displaystyle P = QQ^{\transpose} = \begin{bmatrix} 5/6 \amp 0 \amp 1/3 \amp -1/6 \amp 0 \\ 0 \amp 6/7 \amp -1/7 \amp -2/7 \amp -1/7 \\ 1/3 \amp -1/7 \amp 4/21 \amp 1/21 \amp -1/7 \\ -1/6 \amp -2/7 \amp 1/21 \amp 11/42 \amp -2/7 \\ 0 \amp -1/7 \amp -1/7 \amp -2/7 \amp 6/7 \end{bmatrix}\)

Subsection 6.4.2 \(QR\) factorizations

Now that we’ve seen how the Gram-Schmidt algorithm forms an orthonormal basis for a given subspace, we will explore how the algorithm leads to an important matrix factorization known as the \(QR\) factorization. The \(QR\) factorization can be computed in most any computer package that does linear algebra.

Activity 6.4.4.

Suppose that \(A\) is the \(4\by3\) matrix whose columns are

\begin{equation*} \vvec_1 = \fourvec1111,\hspace{24pt} \vvec_2 = \fourvec1322,\hspace{24pt} \vvec_3 = \fourvec1{-3}{-3}{-3}\text{.} \end{equation*}

These vectors form a basis for \(W\text{,}\) the subspace of \(\real^4\) that we encountered in Activity 6.4.2. Since these vectors are the columns of \(A\text{,}\) we have \(\col(A) = W\text{.}\)

When we implemented Gram-Schmidt, we first found an orthogonal basis \(\wvec_1\text{,}\) \(\wvec_2\text{,}\) and \(\wvec_3\) using

\begin{equation*} \begin{aligned} \wvec_1 \amp = \vvec_1 \\ \wvec_2 \amp = \vvec_2 - \proj{\vvec_2}{\wvec_1} \\ \wvec_3 \amp = \vvec_3 - \proj{\vvec_3}{\wvec_1} - \proj{\vvec_3}{\wvec_2} \\ \end{aligned}\text{.} \end{equation*}

That is,

\begin{equation*} \begin{aligned} \wvec_2 \amp = \vvec_2 - \frac{\vvec_2\cdot\wvec_1}{\wvec_1\cdot\wvec_1}\wvec_1 \\ \wvec_3 \amp = \vvec_3 - \frac{\vvec_3\cdot\wvec_1}{\wvec_1\cdot\wvec_1}\wvec_1 - \frac{\vvec_3\cdot\wvec_2}{\wvec_2\cdot\wvec_2}\wvec_2 \\ \end{aligned}\text{.} \end{equation*}

Use these expressions to write \(\vvec_1\text{,}\) \(\vvec_1\text{,}\) and \(\vvec_3\) as linear combinations of \(\wvec_1\text{,}\) \(\wvec_2\text{,}\) and \(\wvec_3\text{.}\)
We next normalized the orthogonal basis \(\wvec_1\text{,}\) \(\wvec_2\text{,}\) and \(\wvec_3\) to obtain an orthonormal basis \(\uvec_1\text{,}\) \(\uvec_2\text{,}\) and \(\uvec_3\text{.}\)

Write the vectors \(\wvec_i\) as scalar multiples of \(\uvec_i\text{.}\) Then use these expressions to write \(\vvec_1\text{,}\) \(\vvec_2\text{,}\) and \(\vvec_3\) as linear combinations of \(\uvec_1\text{,}\) \(\uvec_2\text{,}\) and \(\uvec_3\text{.}\)
Suppose that \(Q = \left[ \begin{array}{ccc} \uvec_1 \amp \uvec_2 \amp \uvec_3 \end{array} \right]\text{.}\) Use the result of the previous part to find a vector \(\rvec_1\) so that \(Q\rvec_1 = \vvec_1\text{.}\)
Then find vectors \(\rvec_2\) and \(\rvec_3\) such that \(Q\rvec_2 = \vvec_2\) and \(Q\rvec_3 = \vvec_3\text{.}\)
Construct the matrix \(R = \left[ \begin{array}{ccc} \rvec_1 \amp \rvec_2 \amp \rvec_3 \end{array} \right]\text{.}\) Remembering that \(A = \left[ \begin{array}{ccc} \vvec_1 \amp \vvec_2 \amp \vvec_3 \end{array} \right]\text{,}\) explain why \(A = QR\text{.}\)
What is special about the shape of \(R\text{?}\)
Suppose that \(A\) is a \(10\by 6\) matrix whose columns are linearly independent. This means that the columns of \(A\) form a basis for \(W=\col(A)\text{,}\) a 6-dimensional subspace of \(\real^{10}\text{.}\) Suppose that we apply Gram-Schmidt orthogonalization to create an orthonormal basis whose vectors form the columns of \(Q\) and that we write \(A=QR\text{.}\) What are the shape of \(Q\) and what the shape of \(R\text{?}\)

Answer.

\begin{equation*} \begin{aligned} \vvec_1 \amp = \wvec_1 \\ \vvec_2 \amp = 2\wvec_1 + \wvec_2 \\ \vvec_3 \amp = -2\wvec_1 - 2\wvec_2 + \wvec_3 \\ \end{aligned} \end{equation*}
\begin{equation*} \begin{aligned} \vvec_1 \amp = 2\uvec_1 \\ \vvec_2 \amp = 4\uvec_1 + \sqrt{2}\uvec_2 \\ \vvec_3 \amp = -4\uvec_1 - 2\sqrt{2}\uvec_2 + 2\uvec_3 \\ \end{aligned} \end{equation*}
\(\vvec_1 = Q\threevec{2}00\text{.}\)
\(\vvec_2 = Q\threevec4{\sqrt{2}}0\) and \(\vvec_3=Q\threevec{-4}{-2\sqrt{2}}2\text{.}\)
We have \(QR = \begin{bmatrix} Q\rvec_1 \amp Q\rvec_2 \amp Q\rvec_3 \end{bmatrix} = A\text{.}\)
\(R\) is upper triangular.
\(Q\) will be \(10\by6\) and \(R\) will be \(6\by 6\text{.}\)

Solution.

\begin{equation*} \begin{aligned} \wvec_1 \amp = \vvec_1 \\ \wvec_2 \amp = \vvec_2 - 2\wvec_1 \\ \wvec_3 \amp = \vvec_3 + 2\wvec_1 + 2\wvec_2 \\ \end{aligned} \end{equation*}

Therefore,

\begin{equation*} \begin{aligned} \vvec_1 \amp = \wvec_1 \\ \vvec_2 \amp = 2\wvec_1 + \wvec_2 \\ \vvec_3 \amp = -2\wvec_1 - 2\wvec_2 + \wvec_3 \\ \end{aligned} \end{equation*}
\(\wvec_i = \len{\wvec_i}\uvec_i\) so we have

\begin{equation*} \wvec_1 = 2\uvec_1,\hspace{24pt} \wvec_1 = \sqrt{2}\uvec_2,\hspace{24pt} \wvec_1 = 2\uvec_3\text{.} \end{equation*}

This leads to

\begin{equation*} \begin{aligned} \vvec_1 \amp = 2\uvec_1 \\ \vvec_2 \amp = 4\uvec_1 + \sqrt{2}\uvec_2 \\ \vvec_3 \amp = -4\uvec_1 - 2\sqrt{2}\uvec_2 + 2\uvec_3 \\ \end{aligned} \end{equation*}
Since \(\vvec_1 = 2\uvec_1\text{,}\) we have \(\vvec_1 = Q\threevec{2}00\text{.}\)
In the same way, we have \(\vvec_2 = Q\threevec4{\sqrt{2}}0\) and \(\vvec_3=Q\threevec{-4}{-2\sqrt{2}}2\text{.}\)
We have \(QR = \begin{bmatrix} Q\rvec_1 \amp Q\rvec_2 \amp Q\rvec_3 \end{bmatrix} = A\text{.}\)
\(R=\begin{bmatrix} 2 \amp 4 \amp -4 \\ 0 \amp \sqrt{2} \amp -2\sqrt{2} \\ 0 \amp 0 \amp 2 \\ \end{bmatrix}\) so \(R\) is upper triangular.
\(Q\) will be \(10\by6\) and \(R\) will be a \(6\by 6\) upper triangular matrix.

When the columns of a matrix \(A\) are linearly independent, they form a basis for \(\col(A)\) so that we can perform the Gram-Schmidt algorithm. The previous activity shows how this leads to a factorization of \(A\) as the product of a matrix \(Q\) whose columns are an orthonormal basis for \(\col(A)\) and an upper triangular matrix \(R\text{.}\)

Proposition 6.4.5. \(QR\) factorization.

If \(A\) is an \(m\by n\) matrix whose columns are linearly independent, we may write \(A=QR\) where \(Q\) is an \(m\by n\) matrix whose columns form an orthonormal basis for \(\col(A)\) and \(R\) is an \(n\by n\) upper triangular matrix.

Note 6.4.6. Reduced and fulll \(QR\) factorizations.

There are two versions of \(QR\) factorization. The version introduced here is sometimes called the reduced \(QR\) factorization. When \(A\) is \(m \by n\) and \(m > n\text{,}\) the full \(QR\) factorization can be created by augmenting \(Q\) to a square (\(m \by m\)) orthogonal matrix \(\left[ Q Q+ \right]\text{.}\) Similarly, we can add 0’s below \(R\) to create an \(m \by n\) matrix. When we multiply these expanded matrices we get

\begin{equation*} \begin{bmatrix} Q \amp Q+ \end{bmatrix} \begin{bmatrix} R \\ 0\end{bmatrix} = A \end{equation*}

because all the entries in \(Q+\) are only multiplied by 0 when computing the product. We can find a suitable \(Q+\) by finding any orthonormal basis for \(\col(A)^{\perp} = \nul(A^{\transpose})\text{.}\)

When computing \(QR\) factorization in software, it is important to know which form will be returned. For many purposes, the reduced form suffices, and because it is smaller, it is preferable for computations.

Example 6.4.7.

We’ll consider the matrix \(A=\begin{bmatrix} 2 \amp -3 \amp -2 \\ -1 \amp 3 \amp 7 \\ 2 \amp 0 \amp 1 \\ \end{bmatrix}\) whose columns, which we’ll denote \(\vvec_1\text{,}\) \(\vvec_2\text{,}\) and \(\vvec_3\text{,}\) are the basis of \(\real^3\) that we considered in Example 6.4.4. There we found an orthogonal basis \(\wvec_1\text{,}\) \(\wvec_2\text{,}\) and \(\wvec_3\) that satisfied

\begin{align*} \vvec_1 \amp {}={} \wvec_1\\ \vvec_2 \amp {}={} -\wvec_1 + \wvec_2\\ \vvec_3 \amp {}={} -\wvec_1 + 2\wvec_2 + \wvec _3\text{.} \end{align*}

In terms of the resulting orthonormal basis \(\uvec_1\text{,}\) \(\uvec_2\text{,}\) and \(\uvec_3\text{,}\) we had

\begin{equation*} \wvec_1 = 3 \uvec_1,\hspace{24pt} \wvec_2 = 3 \uvec_2,\hspace{24pt} \wvec_3 = 3 \uvec_3 \end{equation*}

so that

\begin{align*} \vvec_1 \amp {}={} 3\uvec_1\\ \vvec_2 \amp {}={} -3\uvec_1 + 3\uvec_2\\ \vvec_3 \amp {}={} -3\uvec_1 + 6\uvec_2 + 3\uvec _3\text{.} \end{align*}

Therefore, if \(Q=\begin{bmatrix} \uvec_1 \amp \uvec_2 \amp \uvec_3 \end{bmatrix}\text{,}\) we have the \(QR\) factorization

\begin{equation*} A = Q\begin{bmatrix} 3 \amp -3 \amp -3 \\ 0 \amp 3 \amp 6 \\ 0 \amp 0 \amp 3 \\ \end{bmatrix} =QR\text{.} \end{equation*}

The value of the \(QR\) factorization will become clear in the next section where we use it to solve least squares problems.

Activity 6.4.5.

We would like to use Python to automate the process of finding and using the \(QR\) factorization of a matrix \(A\text{.}\) Q, R = numpy.linalg.qr(A) will do the trick.

Suppose that \(A\) is the following matrix whose columns are linearly independent.

\begin{equation*} A = \begin{bmatrix} 1 \amp 0 \amp -3 \\ 0 \amp 2 \amp -1 \\ 1 \amp 0 \amp 1 \\ 1 \amp 3 \amp 5 \\ \end{bmatrix}. \end{equation*}

If \(A=QR\text{,}\) what are the shapes of \(Q\) and of \(R\text{?}\) What is special about the form of \(R\text{?}\)
Find the \(QR\) factorization of \(A\) and verify that (a) they havce the anticipated shapes and (b) \(QR = A\text{.}\)
Find the matrix \(P\) that orthogonally projects vectors onto \(\col(A)\text{.}\)
Find \(\bhat\text{,}\) the orthogonal projection of \(\bvec=\fourvec4{-17}{-14}{22}\) onto \(\col(A)\text{.}\)
Explain why the equation \(A\xvec=\bhat\) must be consistent and then find \(\xvec\text{.}\)

Answer.

\(Q\) is \(4\by3\) and \(R\) is a \(3\by3\) upper triangular matrix.
We see that

\begin{equation*} Q=\begin{bmatrix} 1/\sqrt{3} \amp -1/\sqrt{10} \amp -3/\sqrt{23} \\ 0 \amp 2/\sqrt{10} \amp -3/\sqrt{23} \\ 1/\sqrt{3} \amp -1/\sqrt{10} \amp 1/\sqrt{23} \\ 1/\sqrt{3} \amp 2/\sqrt{10} \amp 2/\sqrt{23} \\ \end{bmatrix},\hspace{24pt} R = \begin{bmatrix} \sqrt{3} \amp \sqrt{3} \amp \sqrt{3} \\ 0 \amp \sqrt{10} \amp \sqrt{10} \\ 0 \amp 0 \amp \sqrt{23} \end{bmatrix} \end{equation*}
\(\displaystyle P=QQ^{\transpose} = \begin{bmatrix} 569/690 \amp 22/115 \amp 209/690 \amp -44/345 \\ 22/115 \amp 91/115 \amp -38/115 \amp 16/115 \\ 209/690 \amp -38/115 \amp 329/690 \amp 76/345 \\ -44/345 \amp 16/115 \amp 76/345 \amp 313/345 \\ \end{bmatrix}\)
\(\displaystyle \bhat = \fourvec{-7}{-5}5{14}\)
\(\xvec = \threevec2{-1}3\text{.}\)

Solution.

\(Q\) is \(4\by3\) and \(R\) is a \(3\by3\) upper triangular matrix.
We see that

\begin{equation*} Q=\begin{bmatrix} 1/\sqrt{3} \amp -1/\sqrt{10} \amp -3/\sqrt{23} \\ 0 \amp 2/\sqrt{10} \amp -3/\sqrt{23} \\ 1/\sqrt{3} \amp -1/\sqrt{10} \amp 1/\sqrt{23} \\ 1/\sqrt{3} \amp 2/\sqrt{10} \amp 2/\sqrt{23} \\ \end{bmatrix},\hspace{24pt} R = \begin{bmatrix} \sqrt{3} \amp \sqrt{3} \amp \sqrt{3} \\ 0 \amp \sqrt{10} \amp \sqrt{10} \\ 0 \amp 0 \amp \sqrt{23} \end{bmatrix} \end{equation*}
\(\displaystyle P=QQ^{\transpose} = \begin{bmatrix} 569/690 \amp 22/115 \amp 209/690 \amp -44/345 \\ 22/115 \amp 91/115 \amp -38/115 \amp 16/115 \\ 209/690 \amp -38/115 \amp 329/690 \amp 76/345 \\ -44/345 \amp 16/115 \amp 76/345 \amp 313/345 \\ \end{bmatrix}\)
\(\displaystyle \bhat = QQ^{\transpose}\bvec = \fourvec{-7}{-5}5{14}\)
Since \(\bhat\) is in \(\col(A)\text{,}\) the system \(A\xvec=\bhat\) must be consistent. We find a solution by augmenting \(A\) by \(\bhat\) and row reducing: \(\xvec = \threevec2{-1}3\text{.}\)

Here is some Python code that performs the necessary calculations.

Subsection 6.4.3 Summary

This section explored the Gram-Schmidt orthogonalization algorithm and how it leads to the matrix factorization \(A=QR\) when the columns of \(A\) are linearly independent.

Beginning with a basis \(\vvec_1, \vvec_2,\ldots,\vvec_n\) for a subspace \(W\) of \(\real^m\text{,}\) the vectors

\begin{align*} \wvec_1 \amp = \vvec_1\\ \wvec_2 \amp = \vvec_2 - \proj{\vvec_2}{\wvec_1}\\ \wvec_3 \amp = \vvec_3 - \proj{\vvec_3}{\wvec_1} - \proj{\vvec_3}{\wvec_2}\\ \amp \vdots\\ \wvec_n \amp = \vvec_n - \proj{\vvec_n}{\wvec_1} - \proj{\vvec_n}{\wvec_2} - \cdots - \proj{\vvec_n}{\wvec_{n-1}} \end{align*}

form an orthogonal basis for \(W\text{.}\)
We may scale each vector \(\wvec_i\) appropriately to obtain an orthonormal basis \(\uvec_1,\uvec_2,\ldots,\uvec_n\text{.}\)
Expressing the Gram-Schmidt algorithm in matrix form shows that, if the columns of \(A\) are linearly independent, then we can write \(A=QR\text{,}\) where the columns of \(Q\) form an orthonormal basis for \(\col(A)\) and \(R\) is upper triangular. \(R\) is upper triangular because each \(\wvec_i\) only depends on \(\vvec_j\) for \(j \le i\text{.}\)
numpy.linalg.qr() will compute \(QR\) factorizations for us.

Exercises 6.4.4 Exercises

1.

Suppose that a subspace \(W\) of \(\real^3\) has a basis formed by

\begin{equation*} \vvec_1=\threevec111, \hspace{24pt} \vvec_2=\threevec1{-2}{-2}. \end{equation*}

Find an orthogonal basis for \(W\text{.}\)
Find an orthonormal basis for \(W\text{.}\)
Find the matrix \(P\) that projects vectors orthogonally onto \(W\text{.}\)
Find the orthogonal projection of \(\threevec34{-2}\) onto \(W\text{.}\)

2.

Find the \(QR\) factorization of \(A=\begin{bmatrix} 4 \amp 7 \\ -2 \amp 4 \\ 4 \amp 4 \end{bmatrix} \text{.}\)

3.

Consider the basis of \(\real^3\) given by the vectors

\begin{equation*} \vvec_1=\threevec2{-2}2,\hspace{24pt} \vvec_2=\threevec{-1}{-3}1,\hspace{24pt} \vvec_3=\threevec{2}0{-5}. \end{equation*}

Apply the Gram-Schmit orthogonalization algorithm to find an orthonormal basis \(\uvec_1\text{,}\) \(\uvec_2\text{,}\) \(\uvec_3\) for \(\real^3\text{.}\)
If \(A\) is the \(3\by3\) whose columns are \(\vvec_1\text{,}\) \(\vvec_2\text{,}\) and \(\vvec_3\text{,}\) find the \(QR\) factorization of \(A\text{.}\)
Suppose that we want to solve the equation \(A\xvec=\bvec = \threevec{-9}17\text{,}\) which we can rewrite as \(QR\xvec = \bvec\text{.}\)
1. If we set \(\yvec=R\xvec\text{,}\) the equation \(QR\xvec=\bvec\) becomes \(Q\yvec=\bvec\text{.}\) Explain how to solve the equation \(Q\yvec=\bvec\) in a computationally efficient manner.
2. Explain how to solve the equation \(R\xvec=\yvec\) in a computationally efficient manner.
3. Find the solution \(\xvec\) by first solving \(Q\yvec = \bvec\) and then \(R\xvec = \yvec\text{.}\)

4.

Consider the vectors

\begin{equation*} \vvec_1=\fivevec1{-1}{-1}11,\hspace{24pt} \vvec_2=\fivevec2{1}{4}{-4}2,\hspace{24pt} \vvec_3=\fivevec5{-4}{-3}71 \end{equation*}

and the subspace \(W\) of \(\real^5\) that they span.

Find an orthonormal basis for \(W\text{.}\)
Find the \(5\by5\) matrix that projects vectors orthogonally onto \(W\text{.}\)
Find \(\bhat\text{,}\) the orthogonal projection of \(\bvec=\fivevec{-8}3{-12}8{-4}\) onto \(W\text{.}\)
Express \(\bhat\) as a linear combination of \(\vvec_1\text{,}\) \(\vvec_2\text{,}\) and \(\vvec_3\text{.}\)

5.

Consider the set of vectors

\begin{equation*} \vvec_1=\threevec211,\hspace{24pt} \vvec_2=\threevec122,\hspace{24pt} \vvec_3=\threevec300. \end{equation*}

What happens when we apply the Gram-Schmit orthogonalization algorithm?
Why does the algorithm fail to produce an orthogonal basis for \(\real^3\text{?}\)

6.

Suppose that \(A\) is a matrix with linearly independent columns and having the factorization \(A=QR\text{.}\) Determine whether the following statements are true or false and explain your thinking.

It follows that \(R=Q^{\transpose}A\text{.}\)
The matrix \(R\) is invertible.
The product \(Q^{\transpose}Q\) projects vectors orthogonally onto \(\col(A)\text{.}\)
The columns of \(Q\) are an orthogonal basis for \(\col(A)\text{.}\)
The orthogonal complement \(\col(A)^\perp = \nul(Q^{\transpose})\text{.}\)

7.

Suppose we have the \(QR\) factorization \(A=QR\text{,}\) where \(A\) is a \(7\by 4\) matrix.

What is the shape of the product \(QQ^{\transpose}\text{?}\) Explain the significance of this product.
What is the shape of the product \(Q^{\transpose}Q\text{?}\) Explain the significance of this product.
What is the shape of the matrix \(R\text{?}\)
If \(R\) is a diagonal matrix, what can you say about the columns of \(A\text{?}\)

8.

Suppose we have the \(QR\) factorization \(A=QR\) where the columns of \(A\) are \(\avec_1,\avec_2,\ldots,\avec_n\) and the columns of \(R\) are \(\rvec_1,\rvec_2,\ldots,\rvec_n\text{.}\)

How can the matrix product \(A^{\transpose}A\) be expressed in terms of dot products?
How can the matrix product \(R^{\transpose}R\) be expressed in terms of dot products?
Explain why \(A^{\transpose}A=R^{\transpose}R\text{.}\)
Explain why the dot products \(\avec_i\cdot\avec_j = \rvec_i\cdot\rvec_j\text{.}\)

Understanding Linear Algebra: Data Science Edition

Search Results:

Section 6.4 Finding orthogonal bases

Preview Activity 6.4.1.

Subsection 6.4.1 Gram-Schmidt orthogonalization

Activity 6.4.2.

Example 6.4.4.

Activity 6.4.3.

Subsection 6.4.2 \(QR\) factorizations

Activity 6.4.4.

Proposition 6.4.5. \(QR\) factorization.

Note 6.4.6. Reduced and fulll \(QR\) factorizations.

Example 6.4.7.

Activity 6.4.5.

Subsection 6.4.3 Summary

Exercises 6.4.4 Exercises

1.

2.

3.

4.

5.

6.

7.

8.