Skip to main content

Section 4.7 Partial pivoting and LU factorizations

Our principal tool for finding solutions to linear systems has been Gaussian elimination, which we first met back in Section 2.2. When presented with a linear system, this method finds the reduced row echelon form of the system’s augmented matrix and uses it to read off the solution (with the help of a little back substitution).
While this is a convenient approach for learning linear algebra, people rarely use the reduced row echelon form of a matrix. In fact, many linear algebra software packages (including numpy and scipy) do not include functions for finding the reduced row echelon form.
In this section, we will describe why this is the case and then explore some alternatives. The intent of this section is to demonstrate how linear algebraic computations are handled in practice and to introduce the important idea of matrix factorization, writing a matrix as the product of matrices, each of which has a specified form.

Preview Activity 4.7.1.

To begin, let’s recall how we implemented Gaussian elimination by considering the matrix
\begin{equation*} A = \left[\begin{array}{rrrr} 1 \amp 2 \amp -1 \amp 2 \\ 1 \amp 0 \amp -2 \amp 1 \\ 3 \amp 2 \amp 1 \amp 0 \\ \end{array}\right] \end{equation*}
  1. What is the first row operation we perform? If the resulting matrix is \(A_1\text{,}\) find a matrix \(E_1\) such that \(E_1A = A_1\text{.}\)
  2. What is the matrix inverse \(E_1^{-1}\text{?}\) You can find this using your favorite technique for finding a matrix inverse. However, it may be easier to think about the effect that the row operation has and how it can be undone.
  3. Perform the next two steps in the Gaussian elimination algorithm to obtain \(A_3\text{.}\) Represent these steps using multiplication by matrices \(E_2\) and \(E_3\) so that
    \begin{equation*} E_3E_2E_1A = A_3\text{.} \end{equation*}
  4. Suppose we need to scale the second row by \(-2\text{.}\) What is the \(3\by3\) matrix that perfoms this row operation by left multiplication?
  5. Suppose that we need to interchange the first and second rows. What is the \(3\by3\) matrix that performs this row operation by left multiplication?
Solution.
  1. We first multiply the first row by \(-1\) and add to the second row. This can be represented by multiplying \(E_1A=A_1\) where
    \begin{equation*} E_1=\left[\begin{array}{rrr} 1 \amp 0 \amp 0 \\ -1 \amp 1 \amp 0 \\ 0 \amp 0 \amp 1 \\ \end{array}\right], A_1= \left[\begin{array}{rrrr} 1 \amp 2 \amp -1 \amp 2 \\ 0 \amp -2 \amp -1 \amp -1 \\ 3 \amp 2 \amp 1 \amp 0 \end{array}\right]\text{.} \end{equation*}
  2. To undo the row operation, we multiply the first row by \(1\) and add to the second row. This shows that \(E_1^{-1} = \left[\begin{array}{rrr} 1 \amp 0 \amp 0 \\ 1 \amp 1 \amp 0 \\ 0 \amp 0 \amp 1 \end{array}\right]\text{.}\)
  3. We have
    \begin{equation*} \begin{array}{c} E_2 = \left[\begin{array}{rrr} 1 \amp 0 \amp 0 \\ 0 \amp 1 \amp 0 \\ -3 \amp 0 \amp 1 \end{array}\right], E_3=\left[\begin{array}{rrr} 1 \amp 0 \amp 0 \\ 0 \amp 1 \amp 0 \\ 0 \amp -2 \amp 1 \end{array}\right], \\ A_3=\left[\begin{array}{rrrr} 1 \amp 2 \amp -1 \amp 2 \\ 0 \amp -2 \amp -1 \amp -1 \\ 0 \amp 0 \amp 6 \amp -4 \end{array}\right] \end{array}\text{.} \end{equation*}
  4. The matrix performing this scaling is \(\left[\begin{array}{rrr} 1 \amp 0 \amp 0 \\ 0 \amp -2 \amp 0 \\ 0 \amp 0 \amp 1 \\ \end{array}\right]\text{.}\)
  5. The matrix performing this interchange is \(\left[\begin{array}{rrr} 0 \amp 1 \amp 0 \\ 1 \amp 0 \amp 0 \\ 0 \amp 0 \amp 1 \\ \end{array}\right]\text{.}\)

Subsection 4.7.1 Partial pivoting

The first issue that we will address is the fact that computers do not perform arithemtic operations exactly. For instance, if we ask Python to evaluate 0.1 + 0.2, it reports 0.30000000000000004 though we know that the true value is 0.3.
There are a couple of reasons for this. First, computers perform arithmetic using base 2 numbers, which means that numbers we enter in decimal form, such as \(0.1\text{,}\) must be converted to base 2. Even though 0.1 has a simple decimal form, its representation in base 2 is the repeating decimal
\begin{equation*} 0.000110011001100110011001100110011001100110011\ldots\text{,} \end{equation*}
To accurately represent this number inside a computer would require infinitely many digits. Since a computer can only hold a finite number of digits, we are necessarily using an approximation just by representing this number in a computer.
In addition, arithmetic operations, such as addition, are prone to error. To keep things simple, suppose we have a computer that represents numbers using only three decimal digits. For instance, the number 1.023 would be represented as 1.02 while 0.023421 would be 0.0234. If we add these numbers, we have 1.023 + 0.023421 = 1.046421; the computer reports this sum as 1.02 + 0.0234 = 1.04, whose last digit is not correctly rounded. Generally speaking, we will see this problem, which is called round off error, whenever we add numbers of signficantly different magnitudes.
Remember that Gaussian elimination, when applied to an \(n\by n\) matrix, requires approximately \(\frac 23 n^3\) operations. If we have a \(1000\by1000\) matrix, performing Gaussian elimination requires roughly a billion operations, and the errors introduced in each operation could accumulate. How can we have confidence in the final result? We can never completely avoid these errors, but we can take steps to mitigate them. The next activity will introduce one such technique.

Activity 4.7.2.

Suppose we have a hypothetical computer that represents numbers using only three decimal digits. We will consider the linear system
\begin{equation*} \begin{alignedat}{3} 0.0001x \amp {}+{} \amp y \amp {}={} \amp 1 \\ x \amp {}+{} \amp y \amp {}={} \amp 2 \\ \end{alignedat}\text{.} \end{equation*}
  1. Show that this system has the unique solution
    \begin{equation*} \begin{aligned} x \amp {}={} \frac{10000}{9999} = 1.00010001\ldots, \\ y \amp {}={} \frac{9998}{9999} = 0.99989998\ldots \end{aligned}\text{.} \end{equation*}
  2. If we represent this solution inside our computer that only holds 3 decimal digits, what do we find for the solution? This is the best that we can hope to find using our computer.
  3. Let’s imagine that we use our computer to find the solution using Gaussian elimination; that is, after every arithmetic operation, we keep only three decimal digits. Our first step is to multiply the first equation by 10000 and subtract it from the second equation. If we represent numbers using only three decimal digits, what does this give for the value of \(y\text{?}\)
  4. By substituting our value for \(y\) into the first equation, what do we find for \(x\text{?}\)
  5. Compare the solution we find on our computer with the actual solution and assess the quality of the approximation.
  6. Let’s now modify the linear system by simplying interchanging the equations:
    \begin{equation*} \begin{alignedat}{3} x \amp {}+{} \amp y \amp {}={} \amp 2 \\ 0.0001x \amp {}+{} \amp y \amp {}={} \amp 1 \\ \end{alignedat}\text{.} \end{equation*}
    Of course, this doesn’t change the actual solution. Let’s imagine we use our computer to find the solution using Gaussian elimination. Perform the first step where we multiply the first equation by 0.0001 and subtract from the second equation. What does this give for \(y\) if we represent numbers using only three decimal digits?
  7. Substitute the value you found for \(y\) into the first equation and solve for \(x\text{.}\) Then compare the approximate solution found with our hypothetical computer to the exact solution.
  8. Which approach produces the most accurate approximation?
Solution.
  1. We may find this result by forming the augmented matrix
    \begin{equation*} \left[\begin{array}{rr|r} \frac1{10000} \amp 1 \amp 1 \\ 1 \amp 1 \amp 2 \\ \end{array}\right] \end{equation*}
    and row reducing.
  2. The solution would be rounded to \(x=1.00\) and \(y=1.00\text{.}\)
  3. We first multiply the first row by \(-10000\) and add to the second row. This gives
    \begin{equation*} \begin{array}{l} \left[\begin{array}{rr|r} 0.0001 \amp 1 \amp 1 \\ 0 \amp -9999 \amp -9998 \\ \end{array}\right] \\ \approx \left[\begin{array}{rr|r} 0.0001 \amp 1 \amp 1 \\ 0 \amp -10000 \amp -10000 \\ \end{array}\right] \\ \sim \left[\begin{array}{rr|r} 0.0001 \amp 1 \amp 1 \\ 0 \amp 1 \amp 1 \\ \end{array}\right] \end{array}\text{.} \end{equation*}
    This tells us that \(y\approx 1\text{.}\)
  4. The next steps in the Gaussian elimination algorithm give us
    \begin{equation*} \left[\begin{array}{rr|r} 0.0001 \amp 0 \amp 0 \\ 0 \amp 1 \amp 1 \\ \end{array}\right] \sim \left[\begin{array}{rr|r} 1 \amp 0 \amp 0 \\ 0 \amp 1 \amp 1 \\ \end{array}\right] \end{equation*}
    so have the approximation solution \(x=0.00\) and \(y=1.00\text{.}\)
  5. This compares to the actual solution, as represented in our computer, as \(x=1.00\) and \(y=1.00\text{.}\) Notice that the value of \(x\) differs considerably.
  6. Now we have
    \begin{equation*} \begin{array}{rcl} \left[\begin{array}{rr|r} 1 \amp 1 \amp 2 \\ 0.0001 \amp 1 \amp 1 \\ \end{array}\right] \amp \sim \amp \left[\begin{array}{rr|r} 1 \amp 1 \amp 2 \\ 0 \amp 0.9999 \amp 0.9998 \\ \end{array}\right] \\ \amp \approx \amp \left[\begin{array}{rr|r} 1 \amp 1 \amp 2 \\ 0 \amp 1 \amp 1 \\ \end{array}\right] \end{array}\text{,} \end{equation*}
    which gives \(y=1.00\text{.}\)
  7. Performing one more row operation gives us
    \begin{equation*} \left[\begin{array}{rr|r} 1 \amp 0 \amp 1 \\ 0 \amp 1 \amp 1 \\ \end{array}\right]\text{,} \end{equation*}
    which shows the approximate solution as \(x=1.00\) and \(y=1.00\text{.}\)
  8. The second approach gives an approximate solution that is as accurate as possible given the computer’s limited ability to store digits.
This activity demonstrates how the practical aspects of computing differ from the theoretical. We know that the order in which we write the equations has no effect on the solution space; row interchange is one of our three allowed row operations in the Gaussian elimination algorithm. However, when we are only able to perform arithmetic operations approximately, applying row interchanges can dramatically improve the accuracy of our approximations.
If we could compute the solution exactly, we find
\begin{equation*} x = 1.00010001\ldots, \qquad y = 0.99989998\ldots\text{.} \end{equation*}
Since our hypothetical computer represents numbers using only three decimal digits, our computer finds
\begin{equation*} x \approx 1.00, \qquad y \approx 1.00. \end{equation*}
This is the best we can hope to do with our computer since it is impossible to represent the solution exactly.
When the equations are written in their original order and we multiply the first equation by 10000 and subtract from the second, we find
\begin{equation*} \begin{aligned} (1-10000)y \amp {}={}2-10000 \\ -9999 y \amp {}={} -9998 \\ -10000y\amp {}\approx{} -10000 \\ y \amp {}\approx{} 1.00 \\ \end{aligned}\text{.} \end{equation*}
In fact, we find the same value for \(y\) when we interchange the equations. Here we multiply the first equation by 0.0001 and subtract from the second equation. We then find
\begin{equation*} \begin{aligned} (1-0.0001)y \amp {}={}2-0.0001 \\ -0.9999 y \amp {}={} -0.9998 \\ -y\amp {}\approx{} -1.00 \\ y \amp {}\approx{} 1.00 \\ \end{aligned}\text{.} \end{equation*}
The difference occurs when we substitute \(y\approx 1\) into the first equation. When the equations are written in their original order, we have
\begin{equation*} \begin{aligned} 0.0001x + 1.00 \amp {}\approx{} 1.00 \\ 0.0001x \amp {}\approx{} 0.00 \\ x \amp {}\approx{} 0.00 \\ \end{aligned}\text{.} \end{equation*}
When the equations are written in their original order, we find the solution \(x\approx 0.00, y \approx 1.00\text{.}\)
When we write the equation in the opposite order, however, substituting \(y\approx 1\) into the first equation gives
\begin{equation*} \begin{aligned} x + 1.00 \amp {}\approx{} 2.00 \\ x \amp {}\approx{} 1.00 \\ \end{aligned}\text{.} \end{equation*}
In this case, we find the approximate solution \(x\approx 1.00, y\approx1.00\text{,}\) which is the most accurate solution that our hypothetical computer can find. Simply interchanging the order of the equation produces a much more accurate solution.
We can understand why this works graphically. Each equation represents a line in the plane, and the solution is the intersection point. Notice that the slopes of these lines differ considerably.
When the equations are written in their original order, we substitute \(y\approx1\) into the equation \(0.00001x + y = 1\text{,}\) which is a nearly horizontal line. Along this line, a small change in \(y\) leads to a large change in \(x\text{.}\) The slight difference in our approximation \(y\approx 1\) from the exact value \(y=0.9998999\ldots\) leads to a large difference in the approximation \(x\approx0\) from the exact value \(x=1.00010001\ldots\text{.}\)
If we exchange the order in which the equations are written, we substitute our approximation \(y\approx 1\) into the equation \(x+y=2\text{.}\) Notice that the slope of the associated line is \(-1\text{.}\) On this line, a small change in \(y\) leads to a relatively small change in \(x\) as well. Therefore, the difference in our approximation \(y\approx1\) from the exact value leads to only a small difference in the approximation \(x\approx1\) from the exact value.
This example motivates the technique that computers usually use to perform Gaussian elimation. We only need to perform a row interchange when a zero occurs in a pivot position, such as
\begin{equation*} \left[\begin{array}{rrrr} 1 \amp -1 \amp 2 \amp 2 \\ 0 \amp 0 \amp -3 \amp 1 \\ 0 \amp 2 \amp 2 \amp -3 \\ \end{array}\right]\text{.} \end{equation*}
However, we will perform a row interchange to put the entry having the largest possible absolute value into the pivot position. For instance, when performing Gaussian elimination on the following matrix, we begin by interchanging the first and third rows so that the upper left entry has the largest possible absolute value.
\begin{equation*} \left[\begin{array}{rrrr} 2 \amp 1 \amp 2 \amp 3 \\ 1 \amp -3 \amp -2 \amp 1 \\ -3 \amp 2 \amp 3 \amp -2 \\ \end{array}\right] \sim \left[\begin{array}{rrrr} -3 \amp 2 \amp 3 \amp -2 \\ 1 \amp -3 \amp -2 \amp 1 \\ 2 \amp 1 \amp 2 \amp 3 \\ \end{array}\right]\text{.} \end{equation*}
This technique is called partial pivoting, and it means that, in practice, we will perform many more row interchange operations than we typically do when computing exactly by hand.

Subsection 4.7.2 \(LU\) factorizations

In Subsection 2.3.3, we saw that the number of arithmetic operations needed to perform Gaussian elimination on an \(n\by n\) matrix is about \(\frac23 n^3\text{.}\) This means that a \(1000\by1000\) matrix, requires about two thirds of a billion operations.
Suppose that we have two equations, \(A\xvec = \bvec_1\) and \(A\xvec = \bvec_2\text{,}\) that we would like to solve. Usually, we would form augmented matrices \(\left[\begin{array}{c|c} A \amp \bvec_1 \\ \end{array}\right]\) and \(\left[\begin{array}{c|c} A \amp \bvec_2 \\ \end{array}\right]\) and apply Gaussian elimination. Of course, the steps we perform in these two computations are nearly identical. Is there a way to store some of the computation we perform in reducing \(\left[\begin{array}{c|c} A \amp \bvec_1 \\ \end{array}\right]\) and reuse it in solving subsequent equations? The next activity will point us in the right direction.

Activity 4.7.3.

We will consider the matrix
\begin{equation*} A = \left[\begin{array}{rrr} 1 \amp 2 \amp 1 \\ -2 \amp -3 \amp -2 \\ 3 \amp 7 \amp 4 \\ \end{array}\right] \end{equation*}
and begin performing Gaussian elimination without using partial pivoting.
  1. Perform two row replacement operations to find the row equivalent matrix
    \begin{equation*} A' = \left[\begin{array}{rrr} 1 \amp 2 \amp 1 \\ 0 \amp 1 \amp 0 \\ 0 \amp 1 \amp 1 \\ \end{array}\right]\text{.} \end{equation*}
    Find elementary matrices \(E_1\) and \(E_2\) that perform these two operations so that \(E_2E_1 A = A'\text{.}\)
  2. Perform a third row replacement to find the upper triangular matrix
    \begin{equation*} U = \left[\begin{array}{rrr} 1 \amp 2 \amp 1 \\ 0 \amp 1 \amp 0 \\ 0 \amp 0 \amp 1 \\ \end{array}\right]\text{.} \end{equation*}
    Find the elementary matrix \(E_3\) such that \(E_3E_2E_1A = U\text{.}\)
  3. We can write \(A=E_1^{-1}E_2^{-1}E_3^{-1} U\text{.}\) Find the inverse matrices \(E_1^{-1}\text{,}\) \(E_2^{-1}\text{,}\) and \(E_3^{-1}\) and the product \(L = E_1^{-1}E_2^{-1}E_3^{-1}\text{.}\) Then verify that \(A=LU\text{.}\)
  4. Suppose that we want to solve the equation \(A\xvec = \bvec = \threevec4{-7}{12}\text{.}\) We will write
    \begin{equation*} A\xvec = LU\xvec = L(U\xvec) = \bvec \end{equation*}
    and introduce an unknown vector \(\cvec\) such that \(U\xvec = \cvec\text{.}\) Find \(\cvec\) by noting that \(L\cvec = \bvec\) and solving this equation.
  5. Now that we have found \(\cvec\text{,}\) find \(\xvec\) by solving \(U\xvec = \cvec\text{.}\)
  6. Using the factorization \(A=LU\) and this two-step process, solve the equation \(A\xvec = \threevec{2}{-2}{7}\text{.}\)
Solution.
  1. We have
    \begin{equation*} E_1 = \left[\begin{array}{rrr} 1 \amp 0 \amp 0 \\ 2 \amp 1 \amp 0 \\ 0 \amp 0 \amp 1 \end{array}\right], E_2 = \left[\begin{array}{rrr} 1 \amp 0 \amp 0 \\ 0 \amp 1 \amp 0 \\ -3 \amp 0 \amp 1 \end{array}\right]\text{.} \end{equation*}
  2. The third row replacement is performed by
    \begin{equation*} E_3 = \left[\begin{array}{rrr} 1 \amp 0 \amp 0 \\ 0 \amp 1 \amp 0 \\ 0 \amp -1 \amp 1 \\ \end{array}\right]\text{.} \end{equation*}
  3. The inverse matrices are found to be
    \begin{equation*} E_1^{-1}= \left[\begin{array}{rrr} 1 \amp 0 \amp 0 \\ -2 \amp 1 \amp 0 \\ 0 \amp 0 \amp 1 \\ \end{array}\right], E_2^{-1}= \left[\begin{array}{rrr} 1 \amp 0 \amp 0 \\ 0 \amp 1 \amp 0 \\ 3 \amp 0 \amp 1 \\ \end{array}\right], E_3^{-1}= \left[\begin{array}{rrr} 1 \amp 0 \amp 0 \\ 0 \amp 1 \amp 0 \\ 0 \amp 1 \amp 1 \\ \end{array}\right] \end{equation*}
    giving \(L = E_1^{-1}E_2^{-1}E_3^{-1} = \left[\begin{array}{rrr} 1 \amp 0 \amp 0 \\ -2 \amp 1 \amp 0 \\ 3 \amp 1 \amp 1 \end{array}\right]\) so that \(A = LU\text{.}\)
  4. Noting that \(L\cvec=\bvec\text{,}\) we find \(\cvec=\threevec41{-1}\text{.}\)
  5. To find \(\xvec\text{,}\) we solve the equation \(U\xvec=\cvec\) to obtain \(\xvec=\threevec31{-1}\text{.}\)
  6. We solve \(L\cvec=\threevec{2}{-2}{7}\) to find \(\cvec=\threevec22{-1}\) and \(U\xvec=\cvec\) to find \(\xvec=\threevec{-1}{2}{-1}\text{.}\)
This activity introduces a method for factoring a matrix \(A\) as a product of two triangular matrices, \(A=LU\text{,}\) where \(L\) is lower triangular and \(U\) is upper triangular. The key to finding this factorization is to represent the row operations that we apply in the Gaussian elimination algorithm through multiplication by elementary matrices.

Example 4.7.1.

Suppose we have the equation
\begin{equation*} \begin{bmatrix} 2 \amp -3 \amp 1 \\ -4 \amp 5 \amp 0 \\ 2 \amp -2 \amp 2 \end{bmatrix} \xvec = \cthreevec8{-13}8, \end{equation*}
which we write in the form \(A\xvec = \bvec\text{.}\) We begin by applying the Gaussian elimination algorithm to find an \(LU\) factorization of \(A\text{.}\)
The first step is to multiply the first row of \(A\) by \(2\) and add it to the second row. The elementary matrix
\begin{equation*} E_1 = \begin{bmatrix} 1 \amp 0 \amp 0 \\ 2 \amp 1 \amp 0 \\ 0 \amp 0 \amp 0 \\ \end{bmatrix} \end{equation*}
performs this operation so that \(E_1A = \begin{bmatrix} 2 \amp -3 \amp 1 \\ 0 \amp -1 \amp 2 \\ 2 \amp -2 \amp 2 \end{bmatrix} \text{.}\)
We next apply matrices
\begin{equation*} E_2 = \begin{bmatrix} 1 \amp 0 \amp 0 \\ 0 \amp 1 \amp 0 \\ -1 \amp 0 \amp 1 \end{bmatrix},~~~ E_3 = \begin{bmatrix} 1 \amp 0 \amp 0 \\ 0 \amp 1 \amp 0 \\ 0 \amp 1 \amp 1 \end{bmatrix} \end{equation*}
to obtain the upper triangular matrix \(U = E_3E_2E_1 A = \begin{bmatrix} 2 \amp -3 \amp 1 \\ 0 \amp -1 \amp 2 \\ 0 \amp 0 \amp 3 \end{bmatrix} \text{.}\)
We can write \(U = (E_3E_2E_1)A\text{,}\) which tells us that
\begin{equation*} A = (E_3E_2E_1)^{-1}U = \begin{bmatrix} 1 \amp 0 \amp 0 \\ -2 \amp 1 \amp 0 \\ 1 \amp -1 \amp 1 \end{bmatrix} U = LU. \end{equation*}
That is, we have
\begin{equation*} A = LU = \begin{bmatrix} 1 \amp 0 \amp 0 \\ -2 \amp 1 \amp 0 \\ 1 \amp -1 \amp 1 \end{bmatrix} \begin{bmatrix} 2 \amp -3 \amp 1 \\ 0 \amp -1 \amp 2 \\ 0 \amp 0 \amp 3 \end{bmatrix}. \end{equation*}
Notice that the matrix \(L\) is lower triangular, a result of the fact that the elementary matrices \(E_1\text{,}\) \(E_2\text{,}\) and \(E_3\) are lower triangular.
Now that we have factored \(A=LU\) into two triangular matrices, we can solve the equation \(A\xvec = \bvec\) by solving two triangular systems. We write
\begin{equation*} A\xvec = L(U\xvec) = \bvec \end{equation*}
and define the unknown vector \(\cvec = U\xvec\text{,}\) which is determined by the equation \(L\cvec = \bvec\text{.}\) Because \(L\) is lower triangular, we find the solution using forward substitution, \(\cvec = \threevec833\text{.}\) Finally, we find \(\xvec\text{,}\) the solution to our original system \(A\xvec = \bvec\text{,}\) by applying back substitution to solve \(U\xvec = \cvec\text{.}\) This gives \(\xvec = \threevec2{-1}1\text{.}\)
If we want to solve \(A\xvec = \bvec\) for a different right-hand side \(\bvec\text{,}\) we can simply repeat this two-step process.
An \(LU\) factorization allow us to trade in one equation \(A\xvec = \bvec\) for two simpler equations
\begin{equation*} \begin{aligned} L\cvec \amp = \bvec \\ U\xvec \amp = \cvec. \\ \end{aligned} \end{equation*}
For instance, the equation \(L\cvec = \bvec\) in our example has the form
\begin{equation*} \left[\begin{array}{rrr} 1 \amp 0 \amp 0 \\ -2 \amp 1 \amp 0 \\ 1 \amp -1 \amp 1 \end{array}\right]\cvec = \threevec8{-13}8\text{.} \end{equation*}
Because \(L\) is a lower-triangular matrix, we can read off the first component of \(\cvec\) directly from the equations: \(c_1 = 8\text{.}\) We then have \(-2c_1+c_2 = -13\text{,}\) which gives \(c_2 = 3\text{,}\) and \(c_1 - c_2 + c_3 = 8\text{,}\) which gives \(c_3=3\text{.}\) Solving a triangular system is simplified because we only need to perform a sequence of substitutions.
In fact, solving an equation with an \(n\by n\) triangular matrix requires approximately \(\frac 12 n^2\) operations. Once we have the factorization \(A=LU\text{,}\) we solve the equation \(A\xvec=\bvec\) by solving two equations involving triangular matrices, which requires about \(n^2\) operations. For example, if \(A\) is a \(1000\by1000\) matrix, we solve the equation \(A\xvec = \bvec\) using about one million steps. This compares with roughly a billion operations needed to perform Gaussian elimination, which represents a significant savings. Of course, we have to first find the \(LU\) factorization of \(A\) and this requires roughly the same amount of work as performing Gaussian elimination. However, once we have the \(LU\) factorization, we can use it to solve \(A\xvec=\bvec\) for different right hand sides \(\bvec\text{.}\)
Our discussion so far has ignored one issue, however. Remember that we sometimes have to perform row interchange operations in addition to row replacement. A typical row interchange is represented by multiplication by a matrix such as
\begin{equation*} P = \left[\begin{array}{rrr} 0 \amp 0 \amp 1 \\ 0 \amp 1 \amp 0 \\ 1 \amp 0 \amp 0 \\ \end{array}\right]\text{,} \end{equation*}
which has the effect of interchanging the first and third rows. Notice that this matrix is not triangular so performing a row interchange will disrupt the structure of the \(LU\) factorization we seek. Without giving the details, we simply note that linear algebra software packages compute a matrix \(P\) that describes how the rows are permuted in the Gaussian elimination process.
Once we have \(PA = LU\text{,}\) where \(P\) is a (row) permutation matrix, \(L\) is lower triangular, and \(U\) is upper triangular, we can use this to solve the equation \(A\xvec = \bvec\) using backsubstitution twice. Notice that
\begin{equation*} PA\xvec = LU\xvec = P\bvec\text{.} \end{equation*}
We can determine \(U\xvec\) by solving \(L\cvec = P\bvec\) to find \(\cvec\text{,}\) and then we find \(\xvec\) by solving \(U\xvec = \cvec\text{.}\)

Activity 4.7.4.

The scipy.linalg package can compute \(LU\) factorizations; we write P, L, U = scipy.linalg.LU(A) to obtain the matrices \(P\text{,}\) \(L\text{,}\) and \(U\) such that \(PA = LU\text{.}\) Here is an example.
  1. In Example 4.7.1, we found the \(LU\) factorization
    \begin{equation*} A = \begin{bmatrix} 2 \amp -3 \amp 1 \\ -4 \amp 5 \amp 0 \\ 2 \amp -2 \amp 2 \end{bmatrix} = \begin{bmatrix} 1 \amp 0 \amp 0 \\ -2 \amp 1 \amp 0 \\ 1 \amp -1 \amp 1 \end{bmatrix} \begin{bmatrix} 2 \amp -3 \amp 1 \\ 0 \amp -1 \amp 2 \\ 0 \amp 0 \amp 3 \end{bmatrix}=LU. \end{equation*}
    Using Python, define the matrix \(A\) and then ask for the \(LU\) factorization. What are the matrices \(P\text{,}\) \(L\text{,}\) and \(U\text{?}\)
    Notice that scipy.linalg.LI() finds a different \(LU\) factorization than we found in the previous activity because it is using partial pivoting, as described in the previous section, when it performs Gaussian elimination.
  2. Define the vector \(\bvec = \threevec8{-13}{8}\) in Python and compute \(P\bvec\text{.}\)
  3. Use the matrices L and U to solve \(L\cvec = P\bvec\) and \(U\xvec = \cvec\text{.}\) You should find the same solution \(\xvec\) that you found in the previous activity.
  4. Use the factorization to solve the equation \(A\xvec = \threevec9{-16}{10}\text{.}\)
  5. How does the factorization show us that \(A\) is invertible and that, therefore, every equation \(A\xvec=\bvec\) has a unique solution?
  6. Suppose that we have the matrix
    \begin{equation*} B = \left[\begin{array}{rrr} 3 \amp -1 \amp 2 \\ 2 \amp -1 \amp 1 \\ 2 \amp 1 \amp 3 \\ \end{array}\right]\text{.} \end{equation*}
    Use Python to find the \(LU\) factorization. Explain how the factorization shows that \(B\) is not invertible.
  7. Consider the matrix
    \begin{equation*} C = \left[\begin{array}{rrrr} -2 \amp 1 \amp 2 \amp -1 \\ 1 \amp -1 \amp 0 \amp 2 \\ 3 \amp 2 \amp -1 \amp 0 \\ \end{array}\right] \end{equation*}
    and find its \(LU\) factorization. Explain why \(C\) and \(U\) have the same null space and use this observation to find a basis for \(\nul(A)\text{.}\)
Solution.
  1. Python gives us the matrices
    \begin{equation*} P=\left[\begin{array}{rrr} 0 \amp 1 \amp 0 \\ 1 \amp 0 \amp 0 \\ 0 \amp 0 \amp 1 \end{array}\right],~~~ L=\begin{bmatrix} 1 \amp 0 \amp 0 \\ -\frac12 \amp 1 \amp 0 \\ -\frac12 \amp -1 \amp 1 \end{bmatrix},~~~ \begin{bmatrix} -4 \amp 5 \amp 0 \\ 0 \amp -\frac12 \amp 1 \\ 0 \amp 0 \amp 3 \\ \end{bmatrix}. \end{equation*}
  2. We find \(P\bvec=\threevec{-13}8{8}\text{.}\)
  3. Solving \(L\cvec=P\bvec\text{,}\) we have \(\cvec=\threevec{-13}{\frac32}{3}\text{.}\) Finally, solving \(U\xvec=\cvec\text{,}\) we obtain \(\xvec=\threevec2{-1}1\text{.}\)
  4. We find
    \begin{equation*} P\bvec=\threevec{-16}9{10},~~~ \cvec=\threevec{-16}13,~~~ \xvec=\threevec{4}{0}{1}. \end{equation*}
  5. Because \(P\text{,}\) \(L\text{,}\) and \(U\) are invertible, it follows that
    \begin{equation*} \det A = (\det L \det U)/\det P\neq 0\text{,} \end{equation*}
    which means that \(A\) is invertible.
  6. Python tells us that
    \begin{equation*} P = \left[\begin{array}{rrr} 1 \amp 0 \amp 0 \\ 0 \amp 0 \amp 1 \\ 0 \amp 1 \amp 0 \end{array}\right], L = \left[\begin{array}{rrr} 1 \amp 0 \amp 0 \\ \frac{2}{3} \amp 1 \amp 0 \\ \frac{2}{3} \amp -\frac{1}{5} \amp 1 \end{array}\right], U= \left[\begin{array}{rrr} 3 \amp -1 \amp 2 \\ 0 \amp \frac{5}{3} \amp \frac{5}{3} \\ 0 \amp 0 \amp 0 \end{array}\right]\text{.} \end{equation*}
    Because \(U\) is not invertible, we see that
    \begin{equation*} \det B = (\det L\det U)/\det P = 0 \end{equation*}
    so \(B\) is not invertible.
  7. We have
    \begin{equation*} \begin{array}{c} P= \left[\begin{array}{rrr} 0 \amp 1 \amp 0 \\ 0 \amp 0 \amp 1 \\ 1 \amp 0 \amp 0 \end{array}\right], L= \left[\begin{array}{rrr} 1 \amp 0 \amp 0 \\ -\frac{2}{3} \amp 1 \amp 0 \\ \frac{1}{3} \amp -\frac{5}{7} \amp 1 \end{array}\right], \\ U= \left[\begin{array}{rrrr} 3 \amp 2 \amp -1 \amp 0 \\ 0 \amp \frac{7}{3} \amp \frac{4}{3} \amp -1 \\ 0 \amp 0 \amp \frac{9}{7} \amp \frac{9}{7} \end{array}\right] \end{array}\text{.} \end{equation*}
    We may rewrite \(PC=LU\) as \(C=P^{-1}LU\) so if \(C\xvec=P^{-1}LU\xvec=\zerovec\text{,}\) then \(U\xvec=\zerovec\) because \(P\) and \(L\) are invertible.
    Notice that \(U\) is upper triangular, which means that it is straightforward to find its reduced row echelon form:
    \begin{equation*} \begin{array}{rcl} U= \left[\begin{array}{rrrr} 3 \amp 2 \amp -1 \amp 0 \\ 0 \amp \frac{7}{3} \amp \frac{4}{3} \amp -1 \\ 0 \amp 0 \amp \frac{9}{7} \amp \frac{9}{7} \end{array}\right] \amp \sim \amp \left[\begin{array}{rrrr} 3 \amp 2 \amp -1 \amp 0 \\ 0 \amp 7 \amp 4 \amp -3 \\ 0 \amp 0 \amp 1 \amp 1 \end{array}\right] \\ \amp \sim \amp \left[\begin{array}{rrrr} 3 \amp 2 \amp 0 \amp 1 \\ 0 \amp 1 \amp 0 \amp -1 \\ 0 \amp 0 \amp 1 \amp 1 \end{array}\right] \\ \amp \sim \amp \left[\begin{array}{rrrr} 1 \amp 0 \amp 0 \amp 1 \\ 0 \amp 1 \amp 0 \amp -1 \\ 0 \amp 0 \amp 1 \amp 1 \end{array}\right] \\ \end{array}\text{.} \end{equation*}
    This shows that a basis for \(\nul(C)=\nul(U)\) is \(\fourvec{-1}1{-1}1\text{.}\)

Subsection 4.7.3 Summary

We returned to Gaussian elimination, which we have used as a primary tool for finding solutions to linear systems, and explored its practicality, both in terms of numerical accuracy and computational effort.
  • We saw that the accuracy of computations implemented on a computer could be improved using partial pivoting, a technique that performs row interchanges so that the entry in a pivot position has the largest possible magnitude.
  • Beginning with a matrix \(A\text{,}\) we used the Gaussian elimination algorithm to write \(PA = LU\text{,}\) where \(P\) is a permutation matrix, \(L\) is lower triangular, and \(U\) is upper triangular.
  • Finding this factorization involves roughly as much work as performing Gaussian elimination. However, once we have the factorization, we are able to quickly solve equations of the form \(A\xvec = \bvec\) by first solving \(L\cvec = P\bvec\) and then \(U\xvec = \cvec\text{.}\)

Exercises 4.7.4 Exercises

1.

In this section, we saw that errors made in computer arithmetic can produce approximate solutions that are far from the exact solutions. Here is another example in which this can happen. Consider the matrix
\begin{equation*} A = \left[\begin{array}{cc} 1 \amp 1 \\ 1 \amp 1.0001 \\ \end{array}\right]\text{.} \end{equation*}
  1. Find the exact solution to the equation \(A\xvec = \twovec{2}{2}\text{.}\)
  2. Suppose that this linear system arises in the midst of a larger computation except that, due to some error in the computation of the right hand side of the equation, our computer thinks we want to solve \(A\xvec = \ctwovec{2}{2.0001}\text{.}\) Find the solution to this equation and compare it to the solution of the equation in the previous part of this exericse.
Notice how a small change in the right hand side of the equation leads to a large change in the solution. In this case, we say that the matrix \(A\) is ill-conditioned because the solutions are extremely sensitive to small changes in the right hand side of the equation. Though we will not do so here, it is possible to create a measure of the matrix that tells us when a matrix is ill-conditioned. Regrettably, there is not much we can do to remedy this problem.

2.

In this section, we found the \(LU\) factorization of the matrix
\begin{equation*} A = \left[\begin{array}{rrr} 1 \amp 2 \amp 1 \\ -2 \amp -3 \amp -2 \\ 3 \amp 7 \amp 4 \\ \end{array}\right] \end{equation*}
in one of the activities, without using partial pivoting. Apply a sequence of row operations, now using partial pivoting, to find an upper triangular matrix \(U\) that is row equivalent to \(A\text{.}\)

3.

In the following exercises, use the given \(LU\) factorizations to solve the equations \(A\xvec = \bvec\text{.}\)
  1. Solve the equation
    \begin{equation*} A\xvec = \left[\begin{array}{rr} 1 \amp 0 \\ -2 \amp 1 \\ \end{array}\right] \left[\begin{array}{rr} 3 \amp 1 \\ 0 \amp -2 \\ \end{array}\right]\xvec = \twovec{-3}{0}\text{.} \end{equation*}
  2. Solve the equation
    \begin{equation*} A\xvec = \left[\begin{array}{rrr} 1 \amp 0 \amp 0 \\ -2 \amp 1 \amp 0 \\ -1 \amp 2 \amp 1 \\ \end{array}\right] \left[\begin{array}{rrr} 2 \amp 1 \amp 0 \\ 0 \amp -1 \amp 3 \\ 0 \amp 0 \amp 1 \\ \end{array}\right]\xvec = \threevec{5}{-5}{7}\text{.} \end{equation*}

4.

Use Python to solve the following equation by finding a \(PLU\) factorization:
\begin{equation*} \left[\begin{array}{rrr} 3 \amp 4 \amp -1 \\ 2 \amp 4 \amp 1 \\ -3 \amp 1 \amp 4 \\ \end{array}\right] \xvec = \threevec{-3}{-3}{-4}\text{.} \end{equation*}

5.

In practice, one rarely finds the inverse of a matrix \(A\text{.}\) It requires considerable effort to compute, and we can solve any equation of the form \(A\xvec = \bvec\) using an \(LU\) factorization, which means that the inverse isn’t necessary. In any case, the best way to compute an inverse is using an \(LU\) factorization, as this exericse demonstrates.
  1. Suppose that \(PA = LU\text{.}\) Explain why \(A^{-1} = U^{-1}L^{-1}P\text{.}\)
  2. Describe a relatively efficient way to find the inverse of a triangular matrix. (This means using the \(LU\) factorization to find the inverse of \(A\) will also be relatively efficient.)
  3. Explain how this method shows that the inverse of a invertible triangular matrix is triangular (of the same type).
  4. What are the diagonal entries of the inverse of an invertible trianglar matrix?
  5. Consider the matrix
    \begin{equation*} A = \left[\begin{array}{rrr} 3 \amp 4 \amp -1 \\ 2 \amp 4 \amp 1 \\ -3 \amp 1 \amp 4 \\ \end{array}\right]\text{.} \end{equation*}
    Find the \(LU\) factorization of \(A\) and use it to find \(A^{-1}\text{.}\)

6.

Consider the matrix
\begin{equation*} A = \left[\begin{array}{rrrr} a \amp a \amp a \amp a \\ a \amp b \amp b \amp b \\ a \amp b \amp c \amp c \\ a \amp b \amp c \amp d \\ \end{array}\right]\text{.} \end{equation*}
  1. Find the \(LU\) factorization of \(A\text{.}\)
  2. What conditions on \(a\text{,}\) \(b\text{,}\) \(c\text{,}\) and \(d\) guarantee that \(A\) is invertible?

7.

In the \(LU\) factorization of a matrix, the diagonal entries of \(L\) are all \(1\) while the diagonal entries of \(U\) are not necessarily \(1\text{.}\) This exercise will explore that observation by considering the matrix
\begin{equation*} A = \left[\begin{array}{rrr} 3 \amp 1 \amp 1 \\ -6 \amp -4 \amp -1 \\ 0 \amp -4 \amp 1 \\ \end{array}\right]\text{.} \end{equation*}
  1. Perform Gaussian elimination without partial pivoting to find \(U\text{,}\) an upper triangular matrix that is row equivalent to \(A\text{.}\)
  2. The diagonal entries of \(U\) are called pivots. Explain why \(\det A\) equals the product of the pivots.
  3. What is \(\det A\) for our matrix \(A\text{?}\)
  4. More generally, if we have \(PA=LU\text{,}\) explain why \(\det A\) equals plus or minus the product of the pivots.

8.

Please provide a justification to your responses to these questions.
  1. In this section, our hypothetical computer could only store numbers using 3 decimal places. Most computers can store numbers using 15 or more decimal places. Why do we still need to be concerned about the accuracy of our computations when solving systems of linear equations?
  2. Finding the \(LU\) factorization of a matrix \(A\) is roughly the same amount of work as finding its reduced row echelon form. Why is the \(LU\) factorization useful then?
  3. How can we detect whether a matrix is invertible from its \(LU\) factorization?

9.

Consider the matrix
\begin{equation*} A = \left[\begin{array}{rrrr} -1 \amp 1 \amp 0 \amp 0 \\ 1 \amp -2 \amp 1 \amp 0 \\ 0 \amp 1 \amp -2 \amp 1 \\ 0 \amp 0 \amp -1 \amp 1 \\ \end{array}\right]\text{.} \end{equation*}
  1. Find the \(LU\) factorization of \(A\text{.}\)
  2. Use the factorization to find a basis for \(\nul(A)\text{.}\)
  3. We have seen that \(\nul(A) = \nul(U)\text{.}\) Is it true that \(\col(A) = \col(L)\text{?}\)