Subsection 5.6.1 The power method
Our goal is to find a technique that produces numerical approximations to the eigenvalues and associated eigenvectors of a matrix \(A\text{.}\) We begin by searching for the eigenvalue having the largest absolute value, which is called the dominant eignevalue. The next two examples demonstrate this technique.
Example 5.6.1.
Let’s begin with the positive stochastic matrix
\(A=\left[\begin{array}{rr}
0.7 \amp 0.6 \\
0.3 \amp 0.4 \\
\end{array}\right]
\text{.}\) We spent quite a bit of time studying this type of matrix in
Section 5.5; in particular, we saw that any Markov chain will converge to the unique steady state vector. Let’s rephrase this statement in terms of the eigenvectors of
\(A\text{.}\)
This matrix has eigenvalues \(\lambda_1 = 1\) and \(\lambda_2 =0.1\) so the dominant eigenvalue is \(\lambda_1 = 1\text{.}\) The associated eigenvectors are \(\vvec_1 =
\twovec{2}{1}\) and \(\vvec_2 = \twovec{-1}{1}\text{.}\) Suppose we begin with the vector
\begin{equation*}
\xvec_0 = \twovec{1}{0} = \frac13 \vvec_1 - \frac13 \vvec_2
\end{equation*}
and find
\begin{equation*}
\begin{aligned}
\xvec_1 \amp {}={} A\xvec_0 = \frac13 \vvec_1 - \frac13(0.1)
\vvec_2 \\
\xvec_2 \amp {}={} A^2\xvec_0 = \frac13 \vvec_1 - \frac13(0.1)^2
\vvec_2 \\
\xvec_3 \amp {}={} A^3\xvec_0 = \frac13 \vvec_1 - \frac13(0.1)^3
\vvec_2 \\
\amp \vdots \\
\xvec_k \amp {}={} A^k\xvec_0 = \frac13 \vvec_1 - \frac13(0.1)^k
\vvec_2 \\
\end{aligned}
\end{equation*}
and so forth. Notice that the powers \(0.1^k\) become increasingly small as \(k\) grows so that \(\xvec_k\approx
\frac13\vvec_1\) when \(k\) is large. Therefore, the vectors \(\xvec_k\) become increasingly close to a vector in the eigenspace \(E_1\text{,}\) the eigenspace associated to the dominant eigenvalue. If we did not know the eigenvector \(\vvec_1\text{,}\) we could use a Markov chain in this way to find a basis vector for \(E_1\text{,}\) which is essentially how the Google PageRank algorithm works.
Example 5.6.2.
Let’s now look at the matrix \(A = \left[\begin{array}{rr}
2 \amp 1 \\ 1 \amp 2 \\ \end{array}\right] \text{,}\) which has eigenvalues \(\lambda_1=3\) and \(\lambda_2 = 1\text{.}\) The dominant eigenvalue is \(\lambda_1=3\text{,}\) and the associated eigenvectors are \(\vvec_1 = \twovec{1}{1}\) and \(\vvec_{2} =
\twovec{-1}{1}\text{.}\) Once again, begin with the vector \(\xvec_0 = \twovec{1}{0}=\frac12 \vvec_1 - \frac12 \vvec_2\) so that
\begin{equation*}
\begin{aligned}
\xvec_1 \amp {}={} A\xvec_0 = 3\frac12 \vvec_1 - \frac12
\vvec_2 \\
\xvec_2 \amp {}={} A^2\xvec_0 = 3^2\frac13 \vvec_1 - \frac12
\vvec_2 \\
\xvec_3 \amp {}={} A^3\xvec_0 = 3^3\frac13 \vvec_1 - \frac12
\vvec_2 \\
\amp \vdots \\
\xvec_k \amp {}={} A^k\xvec_0 = 3^k\frac13 \vvec_1 - \frac12
\vvec_2\text{.} \\
\end{aligned}
\end{equation*}
As the figure shows, the vectors \(\xvec_k\) are stretched by a factor of \(3\) in the \(\vvec_1\) direction and not at all in the \(\vvec_2\) direction. Consequently, the vectors \(\xvec_k\) become increasingly long, but their direction becomes closer to the direction of the eigenvector \(\vvec_1=\twovec{1}{1}\) associated to the dominant eigenvalue.
To find an eigenvector associated to the dominant eigenvalue, we will prevent the length of the vectors \(\xvec_k\) from growing arbitrarily large by multiplying by an appropriate scaling constant. Here is one way to do this. Given the vector \(\xvec_k\text{,}\) we identify its component having the largest absolute value and call it \(m_k\text{.}\) We then define \(\overline{\xvec}_k = \frac{1}{m_k} \xvec_k\text{,}\) which means that the component of \(\overline{\xvec}_k\) having the largest absolute value is \(1\text{.}\)
For example, beginning with \(\xvec_0 = \twovec{1}{0}\text{,}\) we find \(\xvec_1 = A\xvec_{0} =
\twovec{2}{1}\text{.}\) The component of \(\xvec_1\) having the largest absolute value is \(m_1=2\) so we multiply by \(\frac{1}{m_1} = \frac12\) to obtain \(\overline{\xvec}_1 =
\twovec{1}{\frac12}\text{.}\) Then \(\xvec_2 = A\overline{\xvec}_1 =
\twovec{\frac52}{2}\text{.}\) Now the component having the largest absolute value is \(m_2=\frac52\) so we multiply by \(\frac25\) to obtain \(\overline{\xvec}_2 = \twovec{1}{\frac45}\text{.}\)
The resulting sequence of vectors \(\overline{\xvec}_k\) is shown in the figure. Notice how the vectors \(\overline{\xvec}_k\) now approach the eigenvector \(\vvec_1\text{,}\) which gives us a way to find the eigenvector \(\vvec=\twovec{1}{1}\text{.}\) This is the power method for finding an eigenvector associated to the dominant eigenvalue of a matrix.
Activity 5.6.2.
Let’s begin by considering the matrix \(A = \left[\begin{array}{rr}
0.5 \amp 0.2 \\
0.4 \amp 0.7 \\
\end{array}\right]\) and the initial vector \(\xvec_0 = \twovec{1}{0}\text{.}\)
Compute the vector \(\xvec_1 =
A\xvec_0\text{.}\)
Find \(m_1\text{,}\) the component of \(\xvec_1\) that has the largest absolute value. Then form \(\overline{\xvec}_1 =
\frac 1{m_1} \xvec_1\text{.}\) Notice that the component having the largest absolute value of \(\overline{\xvec}_1\) is \(1\text{.}\)
Find the vector \(\xvec_2 = A\overline{\xvec}_1\text{.}\) Identify the component \(m_2\) of \(\xvec_2\) having the largest absolute value. Then form \(\overline{\xvec}_2 =
\frac1{m_2}\overline{\xvec}_1\) to obtain a vector in which the component with the largest absolute value is \(1\text{.}\)
-
The Python cell below defines a function that implements the power method. Define the matrix \(A\) and initial vector \(\xvec_0\) below. The command power_eig(A, x0, N)
will compute out the multiplier \(m\) and the vectors \(\overline{\xvec}_k\) for \(N\) steps of the power method.
How does this computation identify an eigenvector of the matrix \(A\text{?}\)
What is the corresponding eigenvalue of this eigenvector?
How do the values of the multipliers \(m_k\) tell us the eigenvalue associated to the eigenvector we have found?
Consider now the matrix \(A=\left[\begin{array}{rr}
-5.1 \amp 5.7 \\
-3.8 \amp 4.4 \\
\end{array}\right]
\text{.}\) Use the power method to find the dominant eigenvalue of \(A\) and an associated eigenvector.
Solution.
We find \(\xvec_1=A\xvec_0 =
\twovec{0.5}{0.4}\text{.}\)
The first component has the largest absolute value so \(m_1=0.5\text{.}\) Therefore, \(\overline{\xvec}_1=\frac1{0.5}\xvec_0 =
\twovec1{0.8}\text{.}\)
In the same way, we obtain \(\xvec_2=A\overline{\xvec}_1 = \twovec{0.66}{0.96}\text{.}\) We see that \(m_2=0.96\) so we have \(\overline{\xvec}_2= \twovec{0.688}{1}\text{.}\)
We see that the vectors \(\xvec_k\) are getting closer and closer to \(\twovec{0.5}1\text{,}\) which we therefore identify as an eigenvector associated to the dominant eigenvalue.
We see that \(A\twovec{0.5}{1} =
\twovec{0.45}{0.9}=0.9 \twovec{0.5}1\text{.}\) Therefore, the dominant eigenvalue is \(\lambda_1=0.9\text{.}\)
More generally, we see that the multiplier \(m_k\) will converge to the dominant eigenvalue.
The power method constructs a sequence of vectors \(\overline{\xvec}_k\) converging to an eigenvector \(\vvec_1=\twovec{1}{\frac23}\text{.}\) The multipliers \(m_k\) converge to \(\lambda_1=1.3\text{,}\) the dominant eigenvalue.
Notice that the power method gives us not only an eigenvector \(\vvec\) but also its associated eigenvalue. As in the activity, consider the matrix \(A=\left[\begin{array}{rr}
-5.1 \amp 5.7 \\
-3.8 \amp 4.4 \\
\end{array}\right]
\text{,}\) which has eigenvector \(\vvec=\twovec{3}{2}\text{.}\) The first component has the largest absolute value so we multiply by \(\frac13\) to obtain \(\overline{\vvec}=\twovec{1}{\frac23}\text{.}\) When we multiply by \(A\text{,}\) we have \(A\overline{\vvec} = \twovec{-1.30}{-0.86}\text{.}\) Notice that the first component still has the largest absolute value so that the multiplier \(m=-1.3\) is the eigenvalue \(\lambda\) corresponding to the eigenvector. This demonstrates the fact that the multipliers \(m_k\) approach the eigenvalue \(\lambda\) having the largest absolute value.
Notice that the power method requires us to choose an initial vector
\(\xvec_0\text{.}\) For most choices, this method will find the eigenvalue having the largest absolute value. However, an unfortunate choice of
\(\xvec_0\) may not. For instance, if we had chosen
\(\xvec_0 = \vvec_2 = \twovec{-1}{1}\) in
Example 5.6.2, the vectors in the sequence
\(\xvec_k = A^k\xvec_0=\lambda_2^k\vvec_2\) will not detect the eigenvector
\(\vvec_1\text{.}\) However, it usually happens that our initial guess
\(\xvec_0\) has some contribution from
\(\vvec_1\) that enables us to find it.
The power method, as presented here, will fail for some combinations of matrix \(A\) and initial vector \(\xvec_0\text{.}\) Although the power method does not work for all matrices, there are some conditions that guarantee that it will work and can tell us something about how quickly the method converges.
Suppose we have a diagonalizable matrix
\(A\text{.}\) Because
\(A\) is diagonalizable, by
Proposition 5.3.7, we know that
\(A\) has
\(n\) linearly independent eigenvectors. Let’s choose the eigenvectors so that
\(\len{\vvec_i}_\infty = 1\) for each
\(i\text{.}\) Because the eigenvectors form a basis for
\(\real^n\text{,}\) we can write any
\(\xvec_0\) as
\begin{equation*}
\xvec_0 = c_1 \vvec_1 + c_2 \vvec_2 + \cdots c_n \vvec_n
\end{equation*}
where \(\vvec_i\) is an eigenvector associated with the \(i\)th eigenvalue (sorted by decreasing absolute value \(|\lambda_1| \ge |\lambda_2| \ge \dots \ge |\lambda_n|\)).
Now let’s work out expressions for the unnormalized vectors \(\xvec_1, \xvec_2, \dots \) and normalized vectors \(\overline{\xvec}_1, \overline{\xvec}_2, \dots \) produced by the power method. First notice that for any vector \(\xvec\) and scalar \(m\text{,}\) \(A \frac{\xvec}{m} = \frac{1}{m} A \xvec\text{,}\) so
\begin{equation*}
\overline{\xvec}_k = \frac{\xvec_k}{m_1 m_2 \cdots m_k} \text{.}
\end{equation*}
This means that althought the algorithm scales at every step along the way, algebraically, we can do all the scaling at the end, focusing our attention first on what happens to the unnormalized \(\xvec_1, \xvec_2, \dots \text{.}\)
\begin{align*}
\xvec_k = A^k \xvec_0 \amp = A^k c_1 \vvec_1 + A^k c_2 \vvec_2 + \cdots + A^k c_n\vvec_n\\
\amp = c_1 \lambda_1^k \vvec_1 + c_2 \lambda_2^k \vvec_2 + \cdots + c_n \lambda_n^k \vvec_n\\
\amp = \lambda_1 \left[ c_1 \vvec_1
+ c_2 \left(\frac{\lambda_2}{\lambda_1}\right)^k \vvec_2
+ \cdots
+ c_n \left(\frac{\lambda_n}{\lambda_1}\right)^k \vvec_n \right]\\
\amp \\
\overline{\xvec}_k \amp = \frac{\lambda_1}{m_1 m_2 \cdots m_k}
\left[
c_1 \vvec_1
+ c_2 \left(\frac{\lambda_2}{\lambda_1}\right)^k \vvec_2
+ \cdots
+ c_n \left(\frac{\lambda_n}{\lambda_1}\right)^k \vvec_n
\right]\text{.}
\end{align*}
Let’s further assume that we have a dominant eigenvalue (i.e., \(|\lambda_1| > |\lambda_2|\)) and that \(\dim E_{\lambda_1} = 1\text{.}\) We can scale the dominant eigenvector so that \(|\vvec_1|_\infty = 1\text{.}\) Since \(\lambda_1\) is dominant, for \(i \ge 2\text{,}\)
\begin{equation*}
\lim_{k \to \infty} \left(\frac{\lambda_i}{\lambda_1}\right)^k = 0\text{.}
\end{equation*}
So the approximation
\begin{equation*}
\overline{\xvec}_k \approx \frac{\lambda_1^k}{m_1 m_2 \cdots m_k} c_1 \vvec_1
\end{equation*}
improves as \(k\) gets larger, as long as \(c_1 \neq 0\text{.}\)
Since \(\len{\overline{\xvec}_k}_\infty = 1 = \len{\vvec_1}_\infty\text{,}\) we know that
\begin{equation*}
\lim_{k \to \infty} \frac{c_1 \lambda_1^k}{m_1 m_2 \cdots m_k} = 1\text{.}
\end{equation*}
From this it follows that
\begin{equation*}
\lim_{k \to \infty} \overline{\xvec}_k = \vvec_1\text{,}
\end{equation*}
the dominant eigenvalue. Furthermore, the lagged sequence also converges to 1:
\begin{equation*}
\lim_{k \to \infty} \frac{c_1 \lambda_1^{k-`}}{m_1 m_2 \cdots m_{k-1}} = 1\text{.}
\end{equation*}
So the ratio of the unlagged to lagged sequence also converges to 1:
\begin{equation*}
\lim_{k \to \infty} \frac{\lambda_1}{m_k} = 1
\end{equation*}
and
\begin{equation*}
\lim_{k \to \infty} m_k = \lambda_1\text{.}
\end{equation*}
The rate of convergence depends on how quickly \(\left(\frac{\lambda_i}{\lambda_1}\right)^k\) converges to 0. If the magnitude of the dominant eigenvector is much larger than the magnitudes of the other eigenvectors, then the convergence will be fast. But if \(\len{\lambda_2}\) is nearly as large as \(\len{\lambda_1}\text{,}\) convergence will be slower.
It is imprtant to remember that although this argument makes use of the eigenvectors, these will not be known when we are employing the algorithm. In particular, we won’t know that we have chosen \(\xvec_0\) so that \(c_1 \neq 0\text{.}\) But most initial vectors will suffice since the vectors with \(c_1 = 0\) are spanned by \(\vvec_2, \vvec_3, \dots, \vvec_n\) and so form an \(n-1\) dimensional subspace of \(\real^n\text{.}\)
Proposition 5.6.4.
If \(A\) is an \(n \by n\) diagonalizable matrix with a dominant eigenvalue \(\lambda_1\text{,}\) and \(\dim E_{\lambda_1} = 1\text{,}\) then there exists a vector \(\xvec_0 \in \real^n\) such that the sequence of vectors produced by the power method converges to the dominant eigenvector:
\begin{equation*}
\overline{\xvec}_1, \overline{\xvec}_2, \overline{\xvec}_3, \dots, \overline{\xvec}_k, \dots \to \vvec_1
\end{equation*}
and the sequence of scaling factors converges to the dominant eigenvalue:
\begin{equation*}
m_1, m_2, m_3 \dots, m_k, \dots \to \lambda_1\text{.}
\end{equation*}
The convergence is faster when the difference between \(|\lambda_1|\) and the magnitudes of the other eigenvalues is larger.