Finding eigenvectors numerically

Section 5.6 Finding eigenvectors numerically

We have typically found eigenvalues of a square matrix \(A\) as the roots of the characteristic polynomial \(\det(A-\lambda I) = 0\) and the associated eigenvectors as the null space \(\nul(A-\lambda I)\text{.}\) Unfortunately, this approach is not practical when we are working with large matrices. First, finding the charactertic polynomial of a large matrix requires considerable computation, as does finding the roots of that polynomial. Second, finding the null space of a singular matrix is plagued by numerical problems, as we will see in the preview activity.

For this reason, we will explore a technique called the power method that finds numerical approximations to the eigenvalues and eigenvectors of a square matrix.

Preview Activity 5.6.1.

Let’s recall some earlier observations about eigenvalues and eigenvectors.

How are the eigenvalues and associated eigenvectors of \(A\) related to those of \(A^{-1}\text{?}\)
How are the eigenvalues and associated eigenvectors of \(A\) related to those of \(A-3I\text{?}\)
If \(\lambda\) is an eigenvalue of \(A\text{,}\) what can we say about the pivot positions of \(A-\lambda I\text{?}\)
Suppose that \(A = \left[\begin{array}{rr} 0.8 \amp 0.4 \\ 0.2 \amp 0.6 \\ \end{array}\right] \text{.}\) Explain how we know that \(1\) is an eigenvalue of \(A\) and then explain why the following computation is incorrect.
Suppose that \(\xvec_0 = \twovec{1}{0}\text{,}\) and we define a sequence \(\xvec_{k+1} = A\xvec_k\text{;}\) in other words, \(\xvec_{k} = A^k\xvec_0\text{.}\) What happens to \(\xvec_k\) as \(k\) grows increasingly large?
Explain how the eigenvalues of \(A\) are responsible for the behavior noted in the previous question.

Solution.

If \(\lambda\) is an eigenvalue of \(A\text{,}\) then \(A\vvec=\lambda\vvec\) for an associated eigenvector \(\vvec\text{.}\) Multiplying by \(A^{-1}\) and \(\lambda^{-1}\text{,}\) we obtain \(\lambda^{-1}\vvec = A^{-1}\vvec\text{,}\) which shows that \(\lambda^{-1}\) is an eigenvalue of \(A^{-1}\text{.}\)
In the same way, if \(\lambda\) is an eigenvalue of \(A\text{,}\) then \(A\vvec=\lambda\vvec\) for an associated eigenvector \(\vvec\text{.}\) This means that \((A-3I)\vvec = (\lambda-3)\vvec\) so that \(\lambda-3\) is an eigenvalue of \(A-3I\text{.}\)
If \(\lambda\) is an eigenvalue of \(A\text{,}\) then \(A-\lambda I\) is not invertible and so has a row without a pivot position.
Since \(A\) is a positive stochastic matrix, we know that \(\lambda=1\) is an eigenvalue and hence that \(A-I\) is not invertible. SymPy, however, tells us that \(A-I\sim I\text{,}\) which cannot be true since \(A-I\) is not invertible.
The vectors \(\xvec_k\) form a Markov chain, which must converge to the steady-state vector \(\qvec=\twovec{\frac23}{\frac13}\text{.}\)
We have eigenvalues \(\lambda_1=1\) and \(\lambda_2=0.4\text{.}\) If we begin with \(\xvec_0=c_1\vvec_1+c_2\vvec_2\) and successively multiply by \(A\text{,}\) we have \(\xvec_k=c_1\vvec_1 + c_2(0.4)^k\vvec_2\text{.}\) When \(k\) becomes large, the coefficient of \(\vvec_2\) becomes insignificantly small so we are left with an eigenvector in \(E_1\text{.}\)

Subsection 5.6.1 The power method

Our goal is to find a technique that produces numerical approximations to the eigenvalues and associated eigenvectors of a matrix \(A\text{.}\) We begin by searching for the eigenvalue having the largest absolute value, which is called the dominant eignevalue. The next two examples demonstrate this technique.

Example 5.6.1.

Let’s begin with the positive stochastic matrix \(A=\left[\begin{array}{rr} 0.7 \amp 0.6 \\ 0.3 \amp 0.4 \\ \end{array}\right] \text{.}\) We spent quite a bit of time studying this type of matrix in Section 5.5; in particular, we saw that any Markov chain will converge to the unique steady state vector. Let’s rephrase this statement in terms of the eigenvectors of \(A\text{.}\)

This matrix has eigenvalues \(\lambda_1 = 1\) and \(\lambda_2 =0.1\) so the dominant eigenvalue is \(\lambda_1 = 1\text{.}\) The associated eigenvectors are \(\vvec_1 = \twovec{2}{1}\) and \(\vvec_2 = \twovec{-1}{1}\text{.}\) Suppose we begin with the vector

\begin{equation*} \xvec_0 = \twovec{1}{0} = \frac13 \vvec_1 - \frac13 \vvec_2 \end{equation*}

and find

\begin{equation*} \begin{aligned} \xvec_1 \amp {}={} A\xvec_0 = \frac13 \vvec_1 - \frac13(0.1) \vvec_2 \\ \xvec_2 \amp {}={} A^2\xvec_0 = \frac13 \vvec_1 - \frac13(0.1)^2 \vvec_2 \\ \xvec_3 \amp {}={} A^3\xvec_0 = \frac13 \vvec_1 - \frac13(0.1)^3 \vvec_2 \\ \amp \vdots \\ \xvec_k \amp {}={} A^k\xvec_0 = \frac13 \vvec_1 - \frac13(0.1)^k \vvec_2 \\ \end{aligned} \end{equation*}

and so forth. Notice that the powers \(0.1^k\) become increasingly small as \(k\) grows so that \(\xvec_k\approx \frac13\vvec_1\) when \(k\) is large. Therefore, the vectors \(\xvec_k\) become increasingly close to a vector in the eigenspace \(E_1\text{,}\) the eigenspace associated to the dominant eigenvalue. If we did not know the eigenvector \(\vvec_1\text{,}\) we could use a Markov chain in this way to find a basis vector for \(E_1\text{,}\) which is essentially how the Google PageRank algorithm works.

Example 5.6.2.

Let’s now look at the matrix \(A = \left[\begin{array}{rr} 2 \amp 1 \\ 1 \amp 2 \\ \end{array}\right] \text{,}\) which has eigenvalues \(\lambda_1=3\) and \(\lambda_2 = 1\text{.}\) The dominant eigenvalue is \(\lambda_1=3\text{,}\) and the associated eigenvectors are \(\vvec_1 = \twovec{1}{1}\) and \(\vvec_{2} = \twovec{-1}{1}\text{.}\) Once again, begin with the vector \(\xvec_0 = \twovec{1}{0}=\frac12 \vvec_1 - \frac12 \vvec_2\) so that

\begin{equation*} \begin{aligned} \xvec_1 \amp {}={} A\xvec_0 = 3\frac12 \vvec_1 - \frac12 \vvec_2 \\ \xvec_2 \amp {}={} A^2\xvec_0 = 3^2\frac13 \vvec_1 - \frac12 \vvec_2 \\ \xvec_3 \amp {}={} A^3\xvec_0 = 3^3\frac13 \vvec_1 - \frac12 \vvec_2 \\ \amp \vdots \\ \xvec_k \amp {}={} A^k\xvec_0 = 3^k\frac13 \vvec_1 - \frac12 \vvec_2\text{.} \\ \end{aligned} \end{equation*}

As the figure shows, the vectors \(\xvec_k\) are stretched by a factor of \(3\) in the \(\vvec_1\) direction and not at all in the \(\vvec_2\) direction. Consequently, the vectors \(\xvec_k\) become increasingly long, but their direction becomes closer to the direction of the eigenvector \(\vvec_1=\twovec{1}{1}\) associated to the dominant eigenvalue.

To find an eigenvector associated to the dominant eigenvalue, we will prevent the length of the vectors \(\xvec_k\) from growing arbitrarily large by multiplying by an appropriate scaling constant. Here is one way to do this. Given the vector \(\xvec_k\text{,}\) we identify its component having the largest absolute value and call it \(m_k\text{.}\) We then define \(\overline{\xvec}_k = \frac{1}{m_k} \xvec_k\text{,}\) which means that the component of \(\overline{\xvec}_k\) having the largest absolute value is \(1\text{.}\)

For example, beginning with \(\xvec_0 = \twovec{1}{0}\text{,}\) we find \(\xvec_1 = A\xvec_{0} = \twovec{2}{1}\text{.}\) The component of \(\xvec_1\) having the largest absolute value is \(m_1=2\) so we multiply by \(\frac{1}{m_1} = \frac12\) to obtain \(\overline{\xvec}_1 = \twovec{1}{\frac12}\text{.}\) Then \(\xvec_2 = A\overline{\xvec}_1 = \twovec{\frac52}{2}\text{.}\) Now the component having the largest absolute value is \(m_2=\frac52\) so we multiply by \(\frac25\) to obtain \(\overline{\xvec}_2 = \twovec{1}{\frac45}\text{.}\)

The resulting sequence of vectors \(\overline{\xvec}_k\) is shown in the figure. Notice how the vectors \(\overline{\xvec}_k\) now approach the eigenvector \(\vvec_1\text{,}\) which gives us a way to find the eigenvector \(\vvec=\twovec{1}{1}\text{.}\) This is the power method for finding an eigenvector associated to the dominant eigenvalue of a matrix.

Note 5.6.3. The \(L^p\)-norms and the maximum absolute value.

The Euclidean norm of an \(n\)-dimensional vector \(\xvec\) is

\begin{equation*} \len{\xvec} = \left(\sum_{i=1}^n (x_i^2\right)^{1/2} = \left(\sum_{i=1}^n |x_i^2|\right)^{1/2}\text{.} \end{equation*}

If we change the exponent from \(2\) to some other number \(p > 1\) we obtain an expression for the \(p\)-norm or \(L^p\) norm:

\begin{equation*} \len{\xvec}_p = \left(\sum_{i=1}^n |x_i^p|\right)^{1/p}\text{.} \end{equation*}

Notice that

\begin{equation*} \len{\xvec}_1 = \left(\sum_{i=1}^n |x_i|\right) \end{equation*}

is the sum of the absolute values of the components of \(\xvec\text{.}\)

As \(p\) gets larger and larger, the \(p\)-norm becomes dominated by the largest component of the vector: \(\len{\xvec}_\infty = \lim_{p \to \infty} \len{\xvec} = \max_i( |x_i| ) \text{.}\) So to compute the scaling factor used in the power method, we can use NumPy’s norm function: numpy.linalg.norm(x, np.inf).

Activity 5.6.2.

Let’s begin by considering the matrix \(A = \left[\begin{array}{rr} 0.5 \amp 0.2 \\ 0.4 \amp 0.7 \\ \end{array}\right]\) and the initial vector \(\xvec_0 = \twovec{1}{0}\text{.}\)

Compute the vector \(\xvec_1 = A\xvec_0\text{.}\)
Find \(m_1\text{,}\) the component of \(\xvec_1\) that has the largest absolute value. Then form \(\overline{\xvec}_1 = \frac 1{m_1} \xvec_1\text{.}\) Notice that the component having the largest absolute value of \(\overline{\xvec}_1\) is \(1\text{.}\)
Find the vector \(\xvec_2 = A\overline{\xvec}_1\text{.}\) Identify the component \(m_2\) of \(\xvec_2\) having the largest absolute value. Then form \(\overline{\xvec}_2 = \frac1{m_2}\overline{\xvec}_1\) to obtain a vector in which the component with the largest absolute value is \(1\text{.}\)
The Python cell below defines a function that implements the power method. Define the matrix \(A\) and initial vector \(\xvec_0\) below. The command power_eig(A, x0, N) will compute out the multiplier \(m\) and the vectors \(\overline{\xvec}_k\) for \(N\) steps of the power method.
How does this computation identify an eigenvector of the matrix \(A\text{?}\)
What is the corresponding eigenvalue of this eigenvector?
How do the values of the multipliers \(m_k\) tell us the eigenvalue associated to the eigenvector we have found?
Consider now the matrix \(A=\left[\begin{array}{rr} -5.1 \amp 5.7 \\ -3.8 \amp 4.4 \\ \end{array}\right] \text{.}\) Use the power method to find the dominant eigenvalue of \(A\) and an associated eigenvector.

Solution.

We find \(\xvec_1=A\xvec_0 = \twovec{0.5}{0.4}\text{.}\)
The first component has the largest absolute value so \(m_1=0.5\text{.}\) Therefore, \(\overline{\xvec}_1=\frac1{0.5}\xvec_0 = \twovec1{0.8}\text{.}\)
In the same way, we obtain \(\xvec_2=A\overline{\xvec}_1 = \twovec{0.66}{0.96}\text{.}\) We see that \(m_2=0.96\) so we have \(\overline{\xvec}_2= \twovec{0.688}{1}\text{.}\)
We see that the vectors \(\xvec_k\) are getting closer and closer to \(\twovec{0.5}1\text{,}\) which we therefore identify as an eigenvector associated to the dominant eigenvalue.
We see that \(A\twovec{0.5}{1} = \twovec{0.45}{0.9}=0.9 \twovec{0.5}1\text{.}\) Therefore, the dominant eigenvalue is \(\lambda_1=0.9\text{.}\)
More generally, we see that the multiplier \(m_k\) will converge to the dominant eigenvalue.
The power method constructs a sequence of vectors \(\overline{\xvec}_k\) converging to an eigenvector \(\vvec_1=\twovec{1}{\frac23}\text{.}\) The multipliers \(m_k\) converge to \(\lambda_1=1.3\text{,}\) the dominant eigenvalue.

Notice that the power method gives us not only an eigenvector \(\vvec\) but also its associated eigenvalue. As in the activity, consider the matrix \(A=\left[\begin{array}{rr} -5.1 \amp 5.7 \\ -3.8 \amp 4.4 \\ \end{array}\right] \text{,}\) which has eigenvector \(\vvec=\twovec{3}{2}\text{.}\) The first component has the largest absolute value so we multiply by \(\frac13\) to obtain \(\overline{\vvec}=\twovec{1}{\frac23}\text{.}\) When we multiply by \(A\text{,}\) we have \(A\overline{\vvec} = \twovec{-1.30}{-0.86}\text{.}\) Notice that the first component still has the largest absolute value so that the multiplier \(m=-1.3\) is the eigenvalue \(\lambda\) corresponding to the eigenvector. This demonstrates the fact that the multipliers \(m_k\) approach the eigenvalue \(\lambda\) having the largest absolute value.

Notice that the power method requires us to choose an initial vector \(\xvec_0\text{.}\) For most choices, this method will find the eigenvalue having the largest absolute value. However, an unfortunate choice of \(\xvec_0\) may not. For instance, if we had chosen \(\xvec_0 = \vvec_2 = \twovec{-1}{1}\) in Example 5.6.2, the vectors in the sequence \(\xvec_k = A^k\xvec_0=\lambda_2^k\vvec_2\) will not detect the eigenvector \(\vvec_1\text{.}\) However, it usually happens that our initial guess \(\xvec_0\) has some contribution from \(\vvec_1\) that enables us to find it.

The power method, as presented here, will fail for some combinations of matrix \(A\) and initial vector \(\xvec_0\text{.}\) Although the power method does not work for all matrices, there are some conditions that guarantee that it will work and can tell us something about how quickly the method converges.

Suppose we have a diagonalizable matrix \(A\text{.}\) Because \(A\) is diagonalizable, by Proposition 5.3.7, we know that \(A\) has \(n\) linearly independent eigenvectors. Let’s choose the eigenvectors so that \(\len{\vvec_i}_\infty = 1\) for each \(i\text{.}\) Because the eigenvectors form a basis for \(\real^n\text{,}\) we can write any \(\xvec_0\) as

\begin{equation*} \xvec_0 = c_1 \vvec_1 + c_2 \vvec_2 + \cdots c_n \vvec_n \end{equation*}

Now let’s work out expressions for the unnormalized vectors \(\xvec_1, \xvec_2, \dots \) and normalized vectors \(\overline{\xvec}_1, \overline{\xvec}_2, \dots \) produced by the power method. First notice that for any vector \(\xvec\) and scalar \(m\text{,}\) \(A \frac{\xvec}{m} = \frac{1}{m} A \xvec\text{,}\) so

\begin{equation*} \overline{\xvec}_k = \frac{\xvec_k}{m_1 m_2 \cdots m_k} \text{.} \end{equation*}

This means that althought the algorithm scales at every step along the way, algebraically, we can do all the scaling at the end, focusing our attention first on what happens to the unnormalized \(\xvec_1, \xvec_2, \dots \text{.}\)

\begin{align*} \xvec_k = A^k \xvec_0 \amp = A^k c_1 \vvec_1 + A^k c_2 \vvec_2 + \cdots + A^k c_n\vvec_n\\ \amp = c_1 \lambda_1^k \vvec_1 + c_2 \lambda_2^k \vvec_2 + \cdots + c_n \lambda_n^k \vvec_n\\ \amp = \lambda_1 \left[ c_1 \vvec_1 + c_2 \left(\frac{\lambda_2}{\lambda_1}\right)^k \vvec_2 + \cdots + c_n \left(\frac{\lambda_n}{\lambda_1}\right)^k \vvec_n \right]\\ \amp \\ \overline{\xvec}_k \amp = \frac{\lambda_1}{m_1 m_2 \cdots m_k} \left[ c_1 \vvec_1 + c_2 \left(\frac{\lambda_2}{\lambda_1}\right)^k \vvec_2 + \cdots + c_n \left(\frac{\lambda_n}{\lambda_1}\right)^k \vvec_n \right]\text{.} \end{align*}

Let’s further assume that we have a dominant eigenvalue (i.e., \(|\lambda_1| > |\lambda_2|\)) and that \(\dim E_{\lambda_1} = 1\text{.}\) We can scale the dominant eigenvector so that \(|\vvec_1|_\infty = 1\text{.}\) Since \(\lambda_1\) is dominant, for \(i \ge 2\text{,}\)

\begin{equation*} \lim_{k \to \infty} \left(\frac{\lambda_i}{\lambda_1}\right)^k = 0\text{.} \end{equation*}

So the approximation

\begin{equation*} \overline{\xvec}_k \approx \frac{\lambda_1^k}{m_1 m_2 \cdots m_k} c_1 \vvec_1 \end{equation*}

improves as \(k\) gets larger, as long as \(c_1 \neq 0\text{.}\)

Since \(\len{\overline{\xvec}_k}_\infty = 1 = \len{\vvec_1}_\infty\text{,}\) we know that

\begin{equation*} \lim_{k \to \infty} \frac{c_1 \lambda_1^k}{m_1 m_2 \cdots m_k} = 1\text{.} \end{equation*}

From this it follows that

\begin{equation*} \lim_{k \to \infty} \overline{\xvec}_k = \vvec_1\text{,} \end{equation*}

the dominant eigenvalue. Furthermore, the lagged sequence also converges to 1:

\begin{equation*} \lim_{k \to \infty} \frac{c_1 \lambda_1^{k-`}}{m_1 m_2 \cdots m_{k-1}} = 1\text{.} \end{equation*}

So the ratio of the unlagged to lagged sequence also converges to 1:

\begin{equation*} \lim_{k \to \infty} \frac{\lambda_1}{m_k} = 1 \end{equation*}

and

\begin{equation*} \lim_{k \to \infty} m_k = \lambda_1\text{.} \end{equation*}

The rate of convergence depends on how quickly \(\left(\frac{\lambda_i}{\lambda_1}\right)^k\) converges to 0. If the magnitude of the dominant eigenvector is much larger than the magnitudes of the other eigenvectors, then the convergence will be fast. But if \(\len{\lambda_2}\) is nearly as large as \(\len{\lambda_1}\text{,}\) convergence will be slower.

It is imprtant to remember that although this argument makes use of the eigenvectors, these will not be known when we are employing the algorithm. In particular, we won’t know that we have chosen \(\xvec_0\) so that \(c_1 \neq 0\text{.}\) But most initial vectors will suffice since the vectors with \(c_1 = 0\) are spanned by \(\vvec_2, \vvec_3, \dots, \vvec_n\) and so form an \(n-1\) dimensional subspace of \(\real^n\text{.}\)

Proposition 5.6.4.

If \(A\) is an \(n \by n\) diagonalizable matrix with a dominant eigenvalue \(\lambda_1\text{,}\) and \(\dim E_{\lambda_1} = 1\text{,}\) then there exists a vector \(\xvec_0 \in \real^n\) such that the sequence of vectors produced by the power method converges to the dominant eigenvector:

\begin{equation*} \overline{\xvec}_1, \overline{\xvec}_2, \overline{\xvec}_3, \dots, \overline{\xvec}_k, \dots \to \vvec_1 \end{equation*}

and the sequence of scaling factors converges to the dominant eigenvalue:

\begin{equation*} m_1, m_2, m_3 \dots, m_k, \dots \to \lambda_1\text{.} \end{equation*}

The convergence is faster when the difference between \(|\lambda_1|\) and the magnitudes of the other eigenvalues is larger.

Subsection 5.6.2 Finding other eigenvalues

The power method gives a technique for finding the dominant eigenvalue of a matrix. We can modify the method to find the other eigenvalues as well.

Activity 5.6.3.

The key to finding the eigenvalue of \(A\) having the smallest absolute value is to note that the eigenvectors of \(A\) are the same as those of \(A^{-1}\text{.}\)

If \(\vvec\) is an eigenvector of \(A\) with associated eigenvector \(\lambda\text{,}\) explain why \(\vvec\) is an eigenvector of \(A^{-1}\) with associated eigenvalue \(\lambda^{-1}\text{.}\)
Explain why the eigenvalue of \(A\) having the smallest absolute value is the reciprocal of the dominant eigenvalue of \(A^{-1}\text{.}\)
Explain how to use the power method applied to \(A^{-1}\) to find the eigenvalue of \(A\) having the smallest absolute value.
If we apply the power method to \(A^{-1}\text{,}\) we begin with an intial vector \(\xvec_0\) and generate the sequence \(\xvec_{k+1} = A^{-1}\xvec_k\text{.}\) It is not computationally efficient to compute \(A^{-1}\text{,}\) however, so instead we solve the equation \(A\xvec_{k+1} = \xvec_k\text{.}\) Explain why an \(LU\) factorization of \(A\) is useful for implementing the power method applied to \(A^{-1}\text{.}\)
The following Python cell defines a command called inverse_power_eig that applies the power method to \(A^{-1}\text{.}\) That is, inverse_power_eig(A, x0, N) computes the vectors \(\xvec_k\text{,}\) where \(\xvec_{k+1} = A^{-1}\xvec_k\text{,}\) and multipliers \(\frac{1}{m_k}\text{,}\) which approximate the eigenvalue of \(A\text{.}\) Use it to find the eigenvalue of \(A=\left[\begin{array}{rr} -5.1 \amp 5.7 \\ -3.8 \amp 4.4 \\ \end{array}\right]\) having the smallest absolute value.
The inverse power method only works if \(A\) is invertible. If \(A\) is not invertible, what is its eigenvalue having the smallest absolute value?
Use the power method and the inverse power method to find the eigenvalues and associated eigenvectors of the matrix \(A = \left[\begin{array}{rr} -0.23 \amp -2.33 \\ -1.16 \amp 1.08 \\ \end{array}\right] \text{.}\)

Solution.

If \(\lambda\) is an eigenvalue of \(A\text{,}\) then \(A\vvec=\lambda\vvec\) for an associated eigenvector \(\vvec\text{.}\) Multiplying by \(A^{-1}\) and \(\lambda^{-1}\text{,}\) we obtain \(\lambda^{-1}\vvec = A^{-1}\vvec\text{,}\) which shows that \(\lambda^{-1}\) is an eigenvalue of \(A^{-1}\text{.}\)
If \(|\lambda_2| \lt |\lambda_1|\text{,}\) then \(|\lambda_2^{-1}| \gt |\lambda_1^{-1}|\text{.}\) Therefore, the reciprocal of the smallest eigenvalue of \(A\) is the dominant eigenvalue of \(A^{-1}\text{.}\)
If we apply the power method to the matrix \(A^{-1}\text{,}\) we will find the dominant eigenvalue \(\lambda\) and an associated eigenvector \(\vvec\) of \(A^{-1}\text{.}\) We know, however, that \(\lambda^{-1}\) will be the eigenvalue of \(A\) having the smallest absolute value and \(\vvec\) will be an associated eigenvector.
We would like to solve equations of the form \(A\xvec = \bvec\) for many different vectors \(\bvec\text{.}\) Using an \(LU\) factorization allows us to recycle for subsequent equations the effort we expend performing Gaussian elimination to solve the first equation.
We obtain the eigenvector \(\vvec_2=\twovec11\) and associated eigenvalue \(\lambda_2=0.6\text{.}\)
If \(A\) is not invertible, then \(\lambda=0\) is the eigenvalue having the smallest absolute value.
We find the dominant eigenvalue to be \(\lambda_1=2.195\) with associated eigenvector \(\vvec_1=\twovec{-0.961}1\text{.}\) The smallest eigenvalue is \(\lambda_2=-1.345\) with associated eigenvector \(\vvec_2=\twovec1{0.478}\text{.}\)

With the power method and the inverse power method, we can now find the eigenvalues of a matrix \(A\) having the largest and smallest absolute values. With one more modification, we can find all the eigenvalues of \(A\text{.}\)

Activity 5.6.4.

Remember that the absolute value of a number tells us how far that number is from \(0\) on the real number line. We may therefore think of the inverse power method as telling us the eigenvalue closest to \(0\text{.}\)

If \(\vvec\) is an eigenvalue of \(A\) with associated eigenvalue \(\lambda\text{,}\) explain why \(\vvec\) is an eigenvector of \(A - sI\) where \(s\) is any scalar.
What is the eigenvalue of \(A-sI\) associated to the eigenvector \(\vvec\text{?}\)
Explain why the eigenvalue of \(A\) closest to \(s\) is the eigenvalue of \(A-sI\) closest to \(0\text{.}\)
Explain why applying the inverse power method to \(A-sI\) gives the eigenvalue of \(A\) closest to \(s\text{.}\)
Consider the matrix \(A = \left[\begin{array}{rrrr} 3.6 \amp 1.6 \amp 4.0 \amp 7.6 \\ 1.6 \amp 2.2 \amp 4.4 \amp 4.1 \\ 3.9 \amp 4.3 \amp 9.0 \amp 0.6 \\ 7.6 \amp 4.1 \amp 0.6 \amp 5.0 \\ \end{array}\right] \text{.}\) If we use the power method and inverse power method, we find two eigenvalues, \(\lambda_1=16.35\) and \(\lambda_2=0.75\text{.}\) Viewing these eigenvalues on a number line, we know that the other eigenvalues lie in the range between \(-\lambda_1\) and \(\lambda_1\text{,}\) as shaded in Figure 5.6.5.

Figure 5.6.5. The range of eigenvalues of \(A\text{.}\)
The Python cell below has a function find_closest_eigenvalue(A, s, x, N) that implements \(N\) steps of the inverse power method using the matrix \(A-sI\) and an initial vector \(x\text{.}\) This function computes approximations to the eigenvalue of \(A\) closest to \(s\) and its associated eigenvector. By trying different values of \(s\) in the shaded regions of the number line shown in Figure 5.6.5, find the other two eigenvalues of \(A\text{.}\)
Write a list of the four eigenvalues of \(A\) in increasing order.

Solution.

If \(A\vvec=\lambda\vvec\text{,}\) then \((A-sI)\vvec = (\lambda-s)\vvec\text{,}\) which shows that \(\vvec\) is also an eigenvector of \(A-sI\text{.}\)
From the previous part, we see that the associated eigenvalue is \(\lambda-s\text{.}\)
If \(\lambda\) is the eigenvalue of \(A\) closest to \(s\text{,}\) then \(\lambda-s\) is an eigenvalue of \(A-sI\) that must be the closest to \(0\text{.}\)
The inverse power method applied to \(A-sI\) tells us the eigenvalue of \(A-sI\) having the smallest absolute value and an associated eigenvector \(\vvec\text{.}\) Therefore, \(\lambda+s\) is the eigenvalue of \(A\) closest to \(s\) and \(\vvec\) is an associated eigenvector.
We begin by trying to find the closest eigenvalue to, say, \(-10\text{.}\) The power method tells us that this eigenvalue is \(\lambda_3=-4.823\text{.}\) If we then try to find the eigenvalue closest to \(10\text{,}\) we find the fourth eigenvalue \(\lambda_4=7.526\text{.}\) It may require some experimentation to find all of the eigenvalues.
The eigenvalues are \(-4.823, 0.746, 7.526, 16.351\text{.}\)

There are some restrictions on the matrices to which this technique applies as we have assumed that the eigenvalues of \(A\) are real and distinct. If \(A\) has repeated or complex eigenvalues, this technique will need to be modified, as explored in some of the exercises.

Subsection 5.6.3 Summary

We have explored the power method as a tool for numerically approximating the eigenvalues and eigenvectors of a matrix.

After choosing an initial vector \(\xvec_0\text{,}\) we define the sequence \(\xvec_{k+1}=A\xvec_k\text{.}\) As \(k\) grows larger, the direction of the vectors \(\xvec_k\) closely approximates the direction of the eigenspace corresponding to the eigenvalue \(\lambda_1\) having the largest absolute value.
We normalize the vectors \(\xvec_k\) by multiplying by \(\frac{1}{m_k}\text{,}\) where \(m_k\) is the component having the largest absolute value. This is known as the \(L^\infty\) norm of the vector \(\xvec\text{,}\) and we can compute it with np.linalg.norm(x, np.inf) In this way, the vectors \(\overline{\xvec}_k\) approach an eigenvector associated to \(\lambda_1\text{,}\) and the multipliers \(m_k\) approach the eigenvalue \(\lambda_1\text{.}\)
To find the eigenvalue having the smallest absolute value, we apply the power method using the matrix \(A^{-1}\text{.}\)
To find the eigenvalue closest to some number \(s\text{,}\) we apply the power method using the matrix \((A-sI)^{-1}\text{.}\)

Exercises 5.6.4 Exercises

This Python cell has the commands power_eig(), inverse_power_eig(), and find_closest_eigenvalue() that we have developed in this section. After this cell has been evaluated, these commands will be available in any other cell on this page.

1.

Suppose that \(A\) is a matrix having eigenvalues \(-3\text{,}\) \(-0.2\text{,}\) \(1\text{,}\) and \(4\text{.}\)

What are the eigenvalues of \(A^{-1}\text{?}\)
What are the eigenvalues of \(A+7I\text{?}\)

2.

Use the commands power_eig(), inverse_power_eig(), and find_closest_eigenvalue to approximate the eigenvalues and associated eigenvectors of the following matrices.

\(A= \left[\begin{array}{rr} -2 \amp -2 \\ -8 \amp -2 \\ \end{array}\right] \text{.}\)
\(A= \left[\begin{array}{rr} 0.6 \amp 0.7 \\ 0.5 \amp 0.2 \\ \end{array}\right] \text{.}\)
\(A= \left[\begin{array}{rrrr} 1.9 \amp -16.0 \amp -13.0 \amp 27.0 \\ -2.4 \amp 20.3 \amp 4.6 \amp -17.7 \\ -0.51 \amp -11.7 \amp -1.4 \amp 13.1 \\ -2.1 \amp 15.3 \amp 6.9 \amp -20.5 \\ \end{array}\right] \text{.}\)

3.

Use the techniques we have seen in this section to find the eigenvalues of the matrix

\begin{equation*} A= \left[\begin{array}{rrrrr} -14.6 \amp 9.0 \amp -14.1 \amp 5.8 \amp 13.0 \\ 27.8 \amp -4.2 \amp 16.0 \amp 0.9 \amp -21.3 \\ -5.5 \amp 3.4 \amp 3.4 \amp 3.3 \amp 1.1 \\ -25.4 \amp 11.3 \amp -15.4 \amp 4.7 \amp 20.3 \\ -33.7 \amp 14.8 \amp -22.5 \amp 9.7 \amp 26.6 \\ \end{array}\right]\text{.} \end{equation*}

4.

Consider the matrix \(A = \left[\begin{array}{rr} 0 \amp -1 \\ -4 \amp 0 \\ \end{array}\right] \text{.}\)

Describe what happens if we apply the power method and the inverse power method using the initial vector \(\xvec_0 = \twovec{1}{0}\text{.}\)
Find the eigenvalues of this matrix and explain this observed behavior.
How can we apply the techniques of this section to find the eigenvalues of \(A\text{?}\)

5.

We have seen that the matrix \(A = \left[\begin{array}{rr} 1 \amp 2 \\ 2 \amp 1 \\ \end{array}\right]\) has eigenvalues \(\lambda_1 = 3\) and \(\lambda_2=-1\) and associated eigenvectors \(\vvec_1 = \twovec{1}{1}\) and \(\vvec_2=\twovec{-1}{1}\text{.}\)

Describe what happens when we apply the inverse power method using the initial vector \(\xvec_0 = \twovec{1}{0}\text{.}\)
Explain why this is happening and provide a contrast with how the power method usually works.
How can we modify the power method to give the dominant eigenvalue in this case?

6.

Suppose that \(A\) is a \(2\by2\) matrix with eigenvalues \(4\) and \(-3\) and that \(B\) is a \(2\by2\) matrix with eigenvalues \(4\) and \(1\text{.}\) If we apply the power method to find the dominant eigenvalue of these matrices to the same degree of accuracy, which matrix will require more steps in the algorithm? Explain your response.

7.

Suppose that we apply the power method to the matrix \(A\) with an initial vector \(\xvec_0\) and find the eigenvalue \(\lambda=3\) and eigenvector \(\vvec\text{.}\) Suppose that we then apply the power method again with a different initial vector and find the same eigenvalue \(\lambda=3\) but a different eigenvector \(\wvec\text{.}\) What can we conclude about the matrix \(A\) in this case?

8.

The power method we have developed only works if the matrix has real eigenvalues. Suppose that \(A\) is a \(2\by2\) matrix that has a complex eigenvalue \(\lambda = 2+3i\text{.}\) What would happen if we apply the power method to \(A\text{?}\)

9.

Consider the matrix \(A=\left[\begin{array}{rr} 1 \amp 1 \\ 0 \amp 1 \\ \end{array}\right] \text{.}\)

Find the eigenvalues and associated eigenvectors of \(A\text{.}\)
Make a prediction about what happens if we apply the power method and the inverse power method to find eigenvalues of \(A\text{.}\)
Verify your prediction using Python.

Understanding Linear Algebra: Data Science Edition

Search Results:

Section 5.6 Finding eigenvectors numerically

Preview Activity 5.6.1.

Subsection 5.6.1 The power method

Example 5.6.1.

Example 5.6.2.

Note 5.6.3. The \(L^p\)-norms and the maximum absolute value.

Activity 5.6.2.

Proposition 5.6.4.

Subsection 5.6.2 Finding other eigenvalues

Activity 5.6.3.

Activity 5.6.4.

Subsection 5.6.3 Summary

Exercises 5.6.4 Exercises

1.

2.

3.

4.

5.

6.

7.

8.

9.