Triangular matrices and Gaussian elimination

Section 4.2 Triangular matrices and Gaussian elimination

With some of the ideas we’ve developed, we can recast the Gaussian elimination algorithm in terms of matrix multiplication and invertibility. These ideas will be especially helpful later when we consider determinants and LU factorizations. Triangular matrices will play an important role.

Subsection 4.2.1 Triangular matrices

Definition 4.2.1.

We say that a matrix \(A\) is lower triangular if all its entries above the diagonal are zero. Similarly, \(A\) is upper triangular if all the entries below the diagonal are zero. A matrix that is both upper and lower triangular is called diagonal because all of its non-zero entries are along the diagonal of the matrix.

For example, the matrix \(L\) below is a lower triangular matrix while \(U\) is upper triangular and \(D\) is diagonal.

\begin{equation*} L = \left[\begin{array}{rrrr} * \amp 0 \amp 0 \amp 0 \\ * \amp * \amp 0 \amp 0 \\ * \amp * \amp * \amp 0 \\ * \amp * \amp * \amp * \\ \end{array}\right], \hspace{20pt} U = \left[\begin{array}{rrrr} * \amp * \amp * \amp * \\ 0 \amp * \amp * \amp * \\ 0 \amp 0 \amp * \amp * \\ 0 \amp 0 \amp 0 \amp * \\ \end{array}\right], \hspace{20pt} D = \left[\begin{array}{rrrr} * \amp 0 \amp 0 \amp 0 \\ 0 \amp * \amp 0 \amp 0 \\ 0 \amp 0 \amp * \amp 0 \\ 0 \amp 0 \amp 0 \amp * \\ \end{array}\right]\text{.} \end{equation*}

We can develop a simple test to determine whether an \(n\by n\) lower triangular matrix is invertible. Let’s use Gaussian elimination to find the reduced row echelon form of the lower triangular matrix

\begin{equation*} \begin{aligned} \left[\begin{array}{rrr} 1 \amp 0 \amp 0 \\ 2 \amp -2 \amp 0 \\ -3 \amp 4 \amp -4 \\ \end{array}\right] \amp {}\sim{} \left[\begin{array}{rrr} 1 \amp 0 \amp 0 \\ 0 \amp -2 \amp 0 \\ 0 \amp 4 \amp -4 \\ \end{array}\right] \\ \amp {}\sim{} \left[\begin{array}{rrr} 1 \amp 0 \amp 0 \\ 0 \amp -2 \amp 0 \\ 0 \amp 0 \amp -4 \\ \end{array}\right] \sim \left[\begin{array}{rrr} 1 \amp 0 \amp 0 \\ 0 \amp 1 \amp 0 \\ 0 \amp 0 \amp 1 \\ \end{array}\right] \end{aligned}\text{.} \end{equation*}

Because the entries on the diagonal are nonzero, we find a pivot position in every row, which tells us that the matrix is invertible.

If, however, there is a zero entry on the diagonal, the matrix cannot be invertible. Considering the matrix below, we see that having a zero on the diagonal leads to a row without a pivot position.

\begin{equation*} \left[\begin{array}{rrr} 1 \amp 0 \amp 0 \\ 2 \amp 0 \amp 0 \\ -3 \amp 4 \amp -4 \\ \end{array}\right] \sim \left[\begin{array}{rrr} 1 \amp 0 \amp 0 \\ 0 \amp 0 \amp 0 \\ 0 \amp 4 \amp -4 \\ \end{array}\right] \sim \left[\begin{array}{rrr} 1 \amp 0 \amp 0 \\ 0 \amp 1 \amp -1 \\ 0 \amp 0 \amp 0 \\ \end{array}\right]\text{.} \end{equation*}

Proposition 4.2.2.

An \(n\by n\) triangular matrix is invertible if and only if the entries on the diagonal are all nonzero.

Subsection 4.2.2 Elementary matrices

Activity 4.2.1. Gaussian elimination and matrix multiplication.

This activity explores how the row operations of scaling, interchange, and replacement can be performed using matrix multiplication.

As an example, we consider the matrix

\begin{equation*} A = \left[\begin{array}{rrr} 1 \amp 2 \amp 1 \\ 2 \amp 0 \amp -2 \\ -1 \amp 2 \amp -1 \\ \end{array}\right] \end{equation*}

and apply a replacement operation that multiplies the first row by \(-2\) and adds it to the second row. Rather than performing this operation in the usual way, we construct a new matrix by applying the desired replacement operation to the identity matrix. To illustrate, we begin with the identity matrix

\begin{equation*} I = \begin{bmatrix} 1 \amp 0 \amp 0 \\ 0 \amp 1 \amp 0 \\ 0 \amp 0 \amp 1 \\ \end{bmatrix} \end{equation*}

and form a new matrix by multiplying the first row by \(-2\) and adding it to the second row to obtain

\begin{equation*} R = \begin{bmatrix} 1 \amp 0 \amp 0 \\ -2 \amp 1 \amp 0 \\ 0 \amp 0 \amp 1 \\ \end{bmatrix}. \end{equation*}

Show that the product \(RA\) is the result of applying the replacement operation to \(A\text{.}\)
Explain why \(R\) is invertible and find its inverse \(R^{-1}\text{.}\)
Describe the relationship between \(R\) and \(R^{-1}\) and use the connection to replacement operations to explain why it holds.
Other row operations can be performed using a similar procedure. For instance, suppose we want to scale the second row of \(A\) by \(4\text{.}\) Find a matrix \(S\) so that \(SA\) is the same as that obtained from the scaling operation. Why is \(S\) invertible and what is \(S^{-1}\text{?}\)
Finally, suppose we want to interchange the first and third rows of \(A\text{.}\) Find a matrix \(P\text{,}\) usually called a permutation matrix that performs this operation. What is \(P^{-1}\text{?}\)
The original matrix \(A\) is seen to be row equivalent to the upper triangular matrix \(U\) by performing three replacement operations on \(A\text{:}\)

\begin{equation*} A = \left[\begin{array}{rrr} 1 \amp 2 \amp 1 \\ 2 \amp 0 \amp -2 \\ -1 \amp 2 \amp -1 \\ \end{array}\right] \sim \left[\begin{array}{rrr} 1 \amp 2 \amp 1 \\ 0 \amp -4 \amp -4 \\ 0 \amp 0 \amp -4 \\ \end{array}\right] = U. \end{equation*}

Find the matrices \(L_1\text{,}\) \(L_2\text{,}\) and \(L_3\) that perform these row replacement operations so that \(L_3L_2L_1 A = U\text{.}\)
Explain why the matrix product \(L_3L_2L_1\) is invertible and use this fact to write \(A = LU\text{.}\) What is the matrix \(L\) that you find? Why do you think we denote it by \(L\text{?}\)

Solution.

Performing the matrix multiplication, we find that

\begin{equation*} RA = \left[\begin{array}{rrr} 1 \amp 0 \amp 0 \\ -2 \amp 1 \amp 0 \\ 0 \amp 0 \amp 1 \\ \end{array}\right] \left[\begin{array}{rrr} 1 \amp 2 \amp 1 \\ 2 \amp 0 \amp -2 \\ -1 \amp 2 \amp -1 \\ \end{array}\right] = \left[\begin{array}{rrr} 1 \amp 2 \amp 1 \\ 0 \amp -4 \amp -4 \\ -1 \amp 2 \amp -1 \\ \end{array}\right]\text{.} \end{equation*}
We know that \(R\) is invertible because it is a lower triangular matrix whose diagonal entries are all 1. We find that \(R^{-1} = \left[\begin{array}{rrr} 1 \amp 0 \amp 0 \\ 2 \amp 1 \amp 0 \\ 0 \amp 0 \amp 1 \\ \end{array}\right] \text{,}\) which can be verified.
But we can see this in another way as well. The replacement operation is reversible; that is, multiplying the first row by \(-2\) and adding it to the second row can be undone by multiplying the first row by \(2\) and adding it to the second row. Symbolically,

\begin{equation*} 2f + (-2f + s) = s\text{.} \end{equation*}
We find that

\begin{equation*} S = \begin{bmatrix} 1 \amp 0 \amp 0 \\ 0 \amp 4 \amp 0 \\ 0 \amp 0 \amp 1 \\ \end{bmatrix},~~~ S^{-1} = \begin{bmatrix} 1 \amp 0 \amp 0 \\ 0 \amp \frac14 \amp 0 \\ 0 \amp 0 \amp 1 \\ \end{bmatrix}. \end{equation*}

This makes sense because scaling a row by \(4\) can be undone by scaling the same row by \(\frac14\text{.}\)
We find that

\begin{equation*} P = \begin{bmatrix} 0 \amp 0 \amp 1 \\ 0 \amp 1 \amp 0 \\ 1 \amp 0 \amp 0 \\ \end{bmatrix}. \end{equation*}

Moreover, \(P=P^{-1}\) because we can undo the interchange operation by repeating it.
Continuing with the Gaussian elimination algorithm, we have \(L_1 = R\text{,}\) as above,

\begin{equation*} L_2 = \left[\begin{array}{rrr} 1 \amp 0 \amp 0 \\ 0 \amp 1 \amp 0 \\ 1 \amp 0 \amp 1 \\ \end{array}\right],~~~ L_3 = \left[\begin{array}{rrr} 1 \amp 0 \amp 0 \\ 0 \amp 1 \amp 0 \\ 0 \amp 1 \amp 1 \\ \end{array}\right]\text{.} \end{equation*}

we then have \(L_3L_2L_1A = U\text{.}\)
Each of the matrices \(L_1\text{,}\) \(L_2\text{,}\) and \(L_3\) is invertible so their product will be as well. Since \((L_3L_2L_1)A = U\text{,}\) we have \(A = (L_3L_2L_1)^{-1}U\text{.}\) Moreover, \(L = (L_3L_2L_1)^{-1} = L_1^{-1}L_2^{-1}L_3^{-1}\) gives \(L=\left[\begin{array}{rrr} 1 \amp 0 \amp 0 \\ 2 \amp 1 \amp 0 \\ -1 \amp -1 \amp 1 \\ \end{array}\right]\text{.}\) Notice that this matrix is lower triangular so we call it \(L\text{.}\)

The following are examples of matrices, known as elementary matrices, that perform the row operations on a matrix having three rows.

Replacement: Multiplying the second row by 3 and adding it to the third row is performed by

\begin{equation*} R = \begin{bmatrix} 1 \amp 0 \amp 0 \\ 0 \amp 1 \amp 0 \\ 0 \amp 3 \amp 1 \\ \end{bmatrix}. \end{equation*}

These matrices are always have 1’s along the diagonal and exactly one additional non-zero entry. Notice that these matrices are always triangular. During the forward phase of Gaussian elimination, they will be lower diagonal, so we will sometimes use \(L\) to emphasize this.
Scaling: Multiplying the third row by 2 is performed by

\begin{equation*} S = \begin{bmatrix} 1 \amp 0 \amp 0 \\ 0 \amp 1 \amp 0 \\ 0 \amp 0 \amp 2 \\ \end{bmatrix}. \end{equation*}

Scaling matrices are diagonal with at most one entry that is not a 1.
Interchange: Interchanging the first two rows is performed by

\begin{equation*} P = \begin{bmatrix} 0 \amp 1 \amp 0 \\ 1 \amp 0 \amp 0 \\ 0 \amp 0 \amp 1 \\ \end{bmatrix}. \end{equation*}

We use \(P\) rather than \(I\) because we reserve \(I\) for the identity matrix. P stands for permutation since an interchange permutes (changes the order of) the rows. These matrices are formed by swapping two rows of the identity matrix.

Example 4.2.3.

Suppose we have

\begin{equation*} A = \left[\begin{array}{rrr} 1 \amp 3 \amp -2 \\ -3 \amp -6 \amp 3 \\ 2 \amp 0 \amp -2 \\ \end{array}\right]. \end{equation*}

For the forward substitution phase of Gaussian elimination, we perform a sequence of three replacement operations. The first replacement operation multiplies the first row by \(3\) and adds the result to the second row. We can perform this operation by multiplying \(A\) by the lower triangular matrix \(L_1\) where

\begin{equation*} L_1A = \left[\begin{array}{rrr} 1 \amp 0 \amp 0 \\ 3 \amp 1 \amp 0 \\ 0 \amp 0 \amp 1 \\ \end{array}\right] \left[\begin{array}{rrr} 1 \amp 3 \amp -2 \\ -3 \amp -6 \amp 3 \\ 2 \amp 0 \amp -2 \\ \end{array}\right] = \left[\begin{array}{rrr} 1 \amp 3 \amp -2 \\ 0 \amp 3 \amp -3 \\ 2 \amp 0 \amp -1 \\ \end{array}\right]. \end{equation*}

The next two replacement operations are performed by the matrices

\begin{equation*} L_2 = \left[\begin{array}{rrr} 1 \amp 0 \amp 0 \\ 0 \amp 1 \amp 0 \\ -2 \amp 0 \amp 1 \\ \end{array}\right], \hspace{24pt} L_3 = \left[\begin{array}{rrr} 1 \amp 0 \amp 0 \\ 0 \amp 1 \amp 0 \\ 0 \amp 2 \amp 1 \\ \end{array}\right] \end{equation*}

so that

\begin{equation*} L_3L_2L_1A = U = \begin{bmatrix} 1 \amp 3 \amp -2 \\ 0 \amp 3 \amp -3 \\ 0 \amp 0 \amp -3 \\ \end{bmatrix}. \end{equation*}

Notice that the inverse of \(L_1\) has the simple form:

\begin{equation*} L_1 = \left[\begin{array}{rrr} 1 \amp 0 \amp 0 \\ 3 \amp 1 \amp 0 \\ 0 \amp 0 \amp 1 \\ \end{array}\right], \hspace{24pt} L_1^{-1} = \left[\begin{array}{rrr} 1 \amp 0 \amp 0 \\ -3 \amp 1 \amp 0 \\ 0 \amp 0 \amp 1 \\ \end{array}\right]\text{.} \end{equation*}

This says that if we want to undo the operation of multiplying the first row by \(3\) and adding to the second row, we should multiply the first row by \(-3\) and add it to the second row. We can see this by simple algebra.

\begin{equation*} \hat{R}_2 = 3 R_1 + R_2 \Rightarrow R_2 = -3 R_1 + \hat{R}_2 \end{equation*}

That is the effect of \(L_1^{-1}\text{,}\) which is obtained by negating the off diagonal entry of \(L_1\text{.}\)

Notice that we now have \(L_3L_2L_1A = U\text{,}\) which gives

\begin{equation*} \begin{aligned} (L_3L_2L_1)A \amp = U \\ (L_3L_2L_1)^{-1}(L_3L_2L_1)A \amp = (L_3L_2L_1)^{-1}U \\ A \amp = (L_3L_2L_1)^{-1}U = LU\\ \end{aligned} \end{equation*}

where \(L\) is the lower triangular matrix

\begin{equation*} L = (L_3L_2L_1)^{-1}= L_1^{-1} L_2^{-1} L_3^{-1} = \left[\begin{array}{rrr} 1 \amp 0 \amp 0 \\ -3 \amp 1 \amp 0 \\ 0 \amp 0 \amp 1 \\ \end{array}\right] \left[\begin{array}{rrr} 1 \amp 0 \amp 0 \\ 0 \amp 1 \amp 0 \\ 2 \amp 0 \amp 1 \\ \end{array}\right] \left[\begin{array}{rrr} 1 \amp 0 \amp 0 \\ 0 \amp 1 \amp 0 \\ 0 \amp -2 \amp 1 \\ \end{array}\right] = \begin{bmatrix} 1 \amp 0 \amp 0 \\ -3 \amp 1 \amp 0 \\ 2 \amp -2 \amp 1 \\ \end{bmatrix}. \end{equation*}

This way of writing \(A=LU\) as the product of a lower and an upper triangular matrix is known as an \(LU\) factorization of \(A\text{,}\) and its usefulness will be explored in Section 4.7.

Subsection 4.2.3 Summary

In this section, we learned that the elementary row operations of replacement, scaling, and interchange can be performed via multiplication by elementary matrices. Elementary matrices can be formed as slight modifications of identity matrices.

Replacement: The elementary matrices for replacement differ from an identity matrix in exactly one entry. These matrices are triangular with one off-diagonal entry that is non-zero. (For the replacements needed during the forward phase of Gaussian elimination, these will all be lower triangular.)
Scaling: The elementary matrices for scaling also differ from an identity matrix in exactly one entry. These matrices are diagonal with one diagonal entry replaced by the scaling factor.
Interchange: The elementary matrices for interchange are formed by swapping two rows of an identity matrix. (This changes two 0s to 1s and two 1s to 0s.)

The inverse of any elementary matrix also has a simple form that is easily determined.

Replacement: Flip the sign on the off-diagonal entry.
Scaling by \(s\text{:}\) Scale by \(\frac{1}{s}\) to undo the scaling.
Interchange: Interchange matrices are their own inverses.

Along they way we saw that triangular matrices also have a nice property, namely

Square triangular matrices are invertible if and only if all the diagonal entries are non-zero. Furthermore, their inverses are easily computed via back substitution.

Exercises 4.2.4 Exercises

1.

If a matrix \(A\) is invertible, there is a sequence of row operations that transforms \(A\) into the identity matrix \(I\text{.}\) We have seen that every row operation can be performed by matrix multiplication. If the \(j^{th}\) step in the Gaussian elimination process is performed by multiplying by \(E_j\text{,}\) then we have

\begin{equation*} E_p\cdots E_2E_1 A = I\text{,} \end{equation*}

which means that

\begin{equation*} A^{-1} = E_p\cdots E_2E_1\text{.} \end{equation*}

For each of the following matrices, find a sequence of row operations that transforms the matrix to the identity \(I\text{.}\) Write the matrices \(E_j\) that perform the steps and use them to find \(A^{-1}\text{.}\)

\begin{equation*} A = \left[\begin{array}{rrr} 0 \amp 2 \amp 0 \\ -3 \amp 0 \amp 0 \\ 0 \amp 0 \amp 1 \\ \end{array}\right]\text{.} \end{equation*}
\begin{equation*} A = \left[\begin{array}{rrrr} 1 \amp 0 \amp 0 \amp 0 \\ 2 \amp 1 \amp 0 \amp 0 \\ 0 \amp -3 \amp 1 \amp 0 \\ 0 \amp 0 \amp 2 \amp 1 \\ \end{array}\right]\text{.} \end{equation*}
\begin{equation*} A = \left[\begin{array}{rrr} 1 \amp 1 \amp 1 \\ 0 \amp 1 \amp 1 \\ 0 \amp 0 \amp 2 \\ \end{array}\right]\text{.} \end{equation*}

2.

Suppose that we start with the \(3\by3\) matrix \(A\text{,}\) perform the following sequence of row operations:

Multiply row 1 by -2 and add to row 2.
Multiply row 1 by 4 and add to row 3.
Scale row 2 by \(1/2\text{.}\)
Multiply row 2 by -1 and add to row 3,

and arrive at the upper triangular matrix

\begin{equation*} U = \left[\begin{array}{rrr} 3 \amp 2 \amp -1 \\ 0 \amp 1 \amp 3 \\ 0 \amp 0 \amp -4 \\ \end{array}\right]. \end{equation*}

Write the matrices \(E_1\text{,}\) \(E_2\text{,}\) \(E_3\text{,}\) and \(E_4\) that perform the four row operations.
Find the matrix \(E = E_4E_3E_2E_1\text{.}\)
We then have \(E_4E_3E_2E_1 A = EA = U\text{.}\) Now that we have the matrix \(E\text{,}\) find the original matrix \(A = E^{-1}U = E_1^{-1} E_2^{-1} E_3^{-1} E_4^{-1} U \text{.}\)

Understanding Linear Algebra: Data Science Edition

Search Results: