Quadratic forms

Section 7.3 Quadratic forms

With our understanding of symmetric matrices and variance in hand, we’ll now explore how to determine the directions in which the variance of a dataset is as large as possible and where it is as small as possible. This is part of a much larger story involving a type of function, called a quadratic form, that we’ll introduce here.

Preview Activity 7.3.1.

Let’s begin by looking at an example. Suppose we have a demeaned column-variate data matrix

\begin{equation*} \Xtilde = \begin{bmatrix} 2 \amp 1 \\ 1 \amp 2 \\ -3 \amp -3 \\ \end{bmatrix} \end{equation*}

Plot the demeaned case vectors as points in Figure 7.3.1. In which direction does the variance appear to be largest and in which does it appear to be smallest?

Figure 7.3.1. Use this coordinate grid to plot the demeaned case vectors as points.
Construct the covariance matrix \(S_{XX}\) and determine the variance in the direction of \(\twovec11\) and the variance in the direction of \(\twovec{-1}1\text{.}\)
What is the total variance of this dataset?
Generally speaking, if \(S_{XX}\) is the covariance matrix of a dataset and \(\uvec\) is an eigenvector of \(S_{XX}\) having unit length and with associated eigenvalue \(\lambda\text{,}\) what is \(V_{\uvec}\text{?}\)

Solution.

The variance appears to be greatest in the direction of \(\twovec11\) and smallest in the direction of \(\twovec{-1}1\text{.}\)
In the direction of \(\twovec11\text{,}\) the variance is \(27/2\text{,}\) while in the direction of \(\twovec{-1}1\text{,}\) the variance is \(1/2\text{.}\)
The total variance is \(27/2 + 1/2 = 28/2 = 14\text{.}\)
\(\displaystyle V_{\uvec} = \uvec\cdot(S_{XX} \uvec) = \lambda\uvec\cdot\uvec = \lambda\)

Subsection 7.3.1 Quadratic forms

Given a demeaned column-variate data matrix \(\Xtilde\) with \(n\) rows, we can use the symmetric covariance matrix \(S=\frac1{n-1} \Xtilde^{\transpose} \Xtilde\) to compute the variance in the direction specified by a unit vector \(\uvec\) using

\begin{equation*} V_{\uvec} = \uvec\cdot(S\uvec)\text{.} \end{equation*}

More generally, a symmetric \(m\by m\) matrix \(A\) defines a function \(q:\real^m \to \real\) by

\begin{equation*} q(\xvec) = \xvec\cdot(A\xvec). \end{equation*}

Notice that this expression is similar to the one we use to find the variance \(V_{\uvec}\) in terms of the covariance matrix \(S\text{.}\) The difference is that we allow \(\xvec\) to be any vector rather than requiring it to be a unit vector and allow \(A\) to be any symmetric matrix. (Not all symmetric matrices are covariance matrices. In particular, covariance matrices never have negative values on the diagonal.)

Example 7.3.2.

Suppose that \(A=\begin{bmatrix} 1 \amp 2\\ 2 \amp 1 \end{bmatrix} \text{.}\) If we write \(\xvec=\twovec{x_1}{x_2}\text{,}\) then we have

\begin{align*} q\left(\twovec {x_1}{x_2}\right) \amp = \twovec {x_1}{x_2} \cdot \left( \begin{bmatrix} 1 \amp 2 \\ 2 \amp 1 \end{bmatrix} \twovec {x_1}{x_2} \right)\\ \amp = \twovec {x_1}{x_2} \cdot \twovec{x_1 + 2x_2}{2x_1 + x_2}\\ \amp = x_1^2 + 2x_1x_2 + 2x_1x_2 + x_2^2\\ \amp = x_1^2 + 4x_1x_2 + x_2^2. \end{align*}

We may evaluate the quadratic form using some input vectors:

\begin{equation*} q\left(\twovec 10\right) = 1, \hspace{24pt} q\left(\twovec 11\right) = 6, \hspace{24pt} q\left(\twovec 24\right) = 52. \end{equation*}

Notice that the value of the quadratic form is a scalar.

Definition 7.3.3.

If \(A\) is a symmetric \(m\by m\) matrix, the quadratic form defined by \(A\) is the function \(q_A(\xvec) = \xvec\cdot(A\xvec)\text{.}\)

Activity 7.3.2.

Let’s look at some more examples of quadratic forms.

Consider the symmetric matrix \(D = \begin{bmatrix} 3 \amp 0 \\ 0 \amp -1 \\ \end{bmatrix} \text{.}\) Write the quadratic form \(q_D(\xvec)\) defined by \(D\) in terms of the components of \(\xvec=\twovec{x_1}{x_2}\text{.}\) What is the value of \(q_D\left(\twovec2{-4}\right)\text{?}\)
Given the symmetric matrix \(A=\begin{bmatrix} 2 \amp 5 \\ 5 \amp -3 \end{bmatrix} \text{,}\) write the quadratic form \(q_A(\xvec)\) defined by \(A\) and evaluate \(q_A\left(\twovec{2}{-1}\right)\text{.}\)
Suppose that \(q\left(\twovec{x_1}{x_2}\right) = 3x_1^2 - 4x_1x_2 + 4x_2^2\text{.}\) Find a symmetric matrix \(A\) such that \(q\) is the quadratic form defined by \(A\text{.}\)
Suppose that \(q\) is a quadratic form and that \(q(\xvec) = 3\text{.}\) What is \(q(2\xvec)\text{?}\) \(q(-\xvec)\text{?}\) \(q(10\xvec)\text{?}\)
Suppose that \(A\) is a symmetric matrix and \(q_A(\xvec)\) is the quadratic form defined by \(A\text{.}\) Suppose that \(\xvec\) is an eigenvector of \(A\) with associated eigenvalue -4 and with length 7. What is \(q_A(\xvec)\text{?}\)

Answer.

\(q_D(\xvec) = 3x_1^2 - x_2^2\) and \(q_D\left(\twovec2{-4}\right) = -4\)
\(q_A(\xvec) = 2x_1^2 + 10x_1x_2-3x_2^2\) and \(q_A\left(\twovec2{-1}\right) = -15\)
\(\displaystyle A=\begin{bmatrix} 3 \amp -2 \\ -2 \amp 4 \\ \end{bmatrix}\)
\(q(2\xvec) = 12\text{,}\) \(q(-\xvec) = 3\) and \(q(10\xvec)=300\)
\(\displaystyle q_A(\xvec) = -196\)

Solution.

\(q_D(\xvec) = 3x_1^2 - x_2^2\) and \(q_D\left(\twovec2{-4}\right) = -4\)
\(q_A(\xvec) = 2x_1^2 + 10x_1x_2-3x_2^2\) and \(q_A\left(\twovec2{-1}\right) = -15\)
\(\displaystyle A=\begin{bmatrix} 3 \amp -2 \\ -2 \amp 4 \\ \end{bmatrix}\)
Notice that \(q(2\xvec) = (2\xvec)\cdot(A(2\xvec)) = 4\xvec\cdot(A\xvec) = 4q(\xvec)\text{.}\) In the same way, we have \(q(-\xvec) = q(\xvec)\) and \(q(10\xvec) = 100q(\xvec)\text{.}\)
\(\displaystyle q_A(\xvec) = \xvec\cdot(A\xvec) = -4\xvec\cdot\xvec = -4(49) = -196\)

Linear algebra is principally about things that are linear. However, quadratic forms, as the name implies, have a distinctly non-linear character. First, if \(A=\begin{bmatrix} a \amp b \\ b \amp c \end{bmatrix}\text{,}\) is a symmetric matrix, then the associated quadratic form is

\begin{equation*} q_A\left(\twovec{x_1}{x_2}\right) = ax_1^2 + 2bx_1x_2 + cx_2^2. \end{equation*}

Notice how the variables \(x_1\) and \(x_2\) are multiplied together, which tells us this isn’t a linear function.

This expression assumes an especially simple form when \(D\) is a diagonal matrix. In particular, if \(D = \begin{bmatrix} a \amp 0 \\ 0 \amp c \\ \end{bmatrix} \text{,}\) then \(q_D\left(\twovec{x_1}{x_2}\right) = ax_1^2 + cx_2^2\text{.}\) This is special because there is no cross-term involving \(x_1x_2\text{.}\)

Remember that matrix transformations have the property that \(T(s\xvec) = sT(\xvec)\text{.}\) Quadratic forms behave differently:

\begin{equation*} q_A(s\xvec) = (s\xvec)\cdot(A(s\xvec)) = s^2\xvec\cdot(A\xvec)= s^2q_A(\xvec). \end{equation*}

For instance, when we multiply \(\xvec\) by the scalar 2, then \(q_A(2\xvec) = 4q_A(\xvec)\text{.}\) Also, notice that \(q_A(-\xvec) = q_A(\xvec)\) since the scalar is squared.

Finally, evaluating a quadratic form on an eigenvector has a particularly simple form. Suppose that \(\xvec\) is an eigenvector of \(A\) with associated eigenvalue \(\lambda\text{.}\) We then have

\begin{equation*} q_A(\xvec) = \xvec\cdot(A\xvec) = \lambda\xvec\cdot\xvec = \lambda \len{\xvec}^2. \end{equation*}

Let’s now return to our motivating question: in which direction \(\uvec\) is the variance \(V_{\uvec}=\uvec\cdot(S\uvec)\) of a dataset as large as possible and in which is it as small as possible. Remembering that the vector \(\uvec\) is a unit vector, we can now state a more general form of this question: If \(q_A(\xvec)\) is a quadratic form, for which unit vectors \(\uvec\) is \(q_A(\uvec)=\uvec\cdot(A\uvec)\) as large as possible and for which is it as small as possible? Since a unit vector specifies a direction, we will often ask for the directions in which the quadratic form \(q(\xvec)\) is at its maximum or minimum value.

Activity 7.3.3.

We can gain some intuition about this problem by graphing the quadratic form and paying particular attention to the unit vectors.

Evaluating the following cell defines the matrix \(D = \begin{bmatrix} 3 \amp 0 \\ 0 \amp -1 \end{bmatrix}\) and displays the graph of the associated quadratic form \(q_D(\xvec)\text{.}\) In addition, the points corresponding to vectors \(\uvec\) with unit length are displayed as a curve.
Notice that the matrix \(D\) is diagonal. In which directions does the quadratic form have its maximum and minimum values?
Write the quadratic form \(q_D\) associated to \(D\text{.}\) What is the value of \(q_D\left(\twovec10\right)\text{?}\) What is the value of \(q_D\left(\twovec01\right)\text{?}\)
Consider a unit vector \(\uvec=\twovec{u_1}{u_2}\) so that \(u_1^2+u_2^2 = 1\text{,}\) an expression we can rewrite as \(u_1^2 = 1-u_2^2\text{.}\) Write the quadratic form \(q_D(\uvec)\) and replace \(u_1^2\) by \(1-u_2^2\text{.}\) Now explain why the maximum of \(q_D(\uvec)\) is 3. In which direction does the maximum occur? Does this agree with what you observed from the graph above?
Write the quadratic form \(q_D(\uvec)\) and replace \(u_2^2\) by \(1-u_1^2\text{.}\) What is the minimum value of \(q_D(\uvec)\) and in which direction does the minimum occur?
Use the previous Sage cell to change the matrix to \(A=\begin{bmatrix} 1 \amp 2 \\ 2 \amp 1 \end{bmatrix}\) and display the graph of the quadratic form \(q_A(\xvec) = \xvec\cdot(A\xvec)\text{.}\) Determine the directions in which the maximum and minimum occur?
Remember that \(A=\begin{bmatrix} 1 \amp 2 \\ 2 \amp 1 \end{bmatrix}\) is symmetric so that \(A=QDQ^{\transpose}\) where \(D\) is the diagonal matrix above and \(Q\) is the orthogonal matrix that rotates vectors by \(45^\circ\text{.}\) Notice that

\begin{equation*} q_A(\uvec) = \uvec\cdot(A\uvec) = \uvec\cdot(QDQ^{\transpose}\uvec) = (Q^{\transpose}\uvec)\cdot(DQ^{\transpose}\uvec) = q_D(\vvec) \end{equation*}

where \(\vvec=Q^{\transpose}\uvec\text{.}\) That is, we have \(q_A(\uvec) = q_D(\vvec)\text{.}\)

Explain why \(\vvec = Q^{\transpose}\uvec\) is also a unit vector; that is, explain why

\begin{equation*} |\vvec|^2 = |Q^{\transpose}\uvec|^2 = (Q^{\transpose}\uvec)\cdot(Q^{\transpose}\uvec) = 1. \end{equation*}
Using the fact that \(q_A(\uvec) = q_D(\vvec)\text{,}\) explain how we now know the maximum value of \(q_A(\uvec)\) is 3 and determine the direction in which it occurs. Also, determine the minimum value of \(q_A(\uvec)\) and determine the direction in which it occurs.

Answer.

The maximum appears to occur in the direction of \(\twovec10\) and the minimum appears to occur in the direction of \(\twovec01\text{.}\)
\(q_D(\xvec) = 3x_1^2 - x_2^2\) so that \(q_D\left(\twovec10\right) = 3\) and \(q_D\left(\twovec01\right) = -1\text{.}\)
The maximum value of \(q_D(\uvec)\) is \(3\) in the direction \(\twovec10\text{.}\)
The minimum value of \(q_D(\uvec)\) is \(-1\) in the direction \(\twovec01\text{.}\)
The maximum appears to occur in the direction \(\twovec11\) and the minimum in the direction \(\twovec{-1}1\text{.}\)
Use the fact that \(QQ^{\transpose}=I\)
The maximum of \(q_A(\uvec)\) is \(3\text{,}\) which occurs when \(\uvec= \twovec{1/\sqrt{2}}{1/\sqrt{2}}\text{.}\)

The maximum of \(q_A(\uvec)\) is \(-1\text{,}\) which occurs when \(\uvec= \twovec{-1/\sqrt{2}}{1/\sqrt{2}}\text{.}\)

Solution.

The maximum appears to occur in the direction of \(\twovec10\) and the minimum appears to occur in the direction of \(\twovec01\text{.}\)
\(q_D(\xvec) = 3x_1^2 - x_2^2\) so that \(q_D\left(\twovec10\right) = 3\) and \(q_D\left(\twovec01\right) = -1\text{.}\)
We have \(q_D(\uvec) = 3u_1^2 - u_2^2 = 3 - 4u_2^2\) so that the quadratic form only depends on \(u_2\text{.}\) The graph of this function of \(u_2\) is a parabola that has a maximum of \(3\) when \(u_2=0\text{.}\) Since \(u_1^2 = 1-u_2^2\text{,}\) this means that the maximum occurs when \(u_1=\pm 1\text{.}\) We therefore see that the maximum value of \(q_D(\uvec)\) is \(3\) in the direction \(\twovec10\) as we saw from the graph.
Now \(q_D(\uvec) = -1 + 4u_1^2\text{,}\) which has a minimum value of \(-1\) when \(u_1=0\text{.}\) Therefore, the minimum value of \(q_D(\uvec)\) is \(-1\) in the direction \(\twovec01\text{.}\)
The graph of \(q_A\) appears to be similar to the graph of \(q_D\) only rotated by \(45^\circ\text{.}\) This means the maximum appears to occur in the direction \(\twovec11\) and the minimum in the direction \(\twovec{-1}1\text{.}\)
Since \(Q\) is orthogonal, we have \(QQ^{\transpose}=I\) so that

\begin{equation*} |\vvec|^2 = (Q^{\transpose}\uvec)\cdot(Q^{\transpose}\uvec) = \uvec\cdot(QQ^{\transpose}\uvec) = \uvec\cdot\uvec = |\uvec|^2 = 1\text{.} \end{equation*}
Since \(q_A(\uvec) = q_D(\vvec)\text{,}\) the maximum of \(q_A(\uvec)\) is \(3\text{,}\) which occurs when \(\vvec = Q^{\transpose}\uvec = \twovec10\text{.}\) This means that \(\uvec=Q\twovec10 = \twovec{1/\sqrt{2}}{1/\sqrt{2}}\text{,}\) the eigenvector of \(A\) associated to \(\lambda=3\text{.}\)

In the same way, the minimum value of \(q_A(\uvec)\) is \(-1\text{,}\) which occurs when \(\uvec=\twovec{1/\sqrt{2}}{1/\sqrt{2}}\text{,}\) the eigenvector of \(A\) associated to \(\lambda=-1\text{.}\)

This activity demonstrates how the eigenvalues of \(A\) determine the maximum and minimum values of the quadratic form \(q_A(\uvec)\) when evaluated on unit vectors and how the associated eigenvectors determine the directions in which the maximum and minimum values occur. Let’s look at another example so that this connection is clear.

Example 7.3.4.

Consider the symmetric matrix \(A=\begin{bmatrix} -7 \amp -6 \\ -6 \amp 2 \\ \end{bmatrix}\text{.}\) Because \(A\) is symmetric, we know that it can be orthogonally diagonalized. In fact, we have \(A=QDQ^{\transpose}\) where

\begin{equation*} D = \begin{bmatrix} 5 \amp 0 \\ 0 \amp -10 \\ \end{bmatrix},\hspace{24pt} Q = \begin{bmatrix} 1/\sqrt{5} \amp 2/\sqrt{5} \\ -2/\sqrt{5} \amp 1/\sqrt{5} \\ \end{bmatrix}\text{.} \end{equation*}

From this diagonalization, we know that \(\lambda_1=5\) is the largest eigenvalue of \(A\) with associated eigenvector \(\uvec_1 = \twovec{1/\sqrt{5}}{-2/\sqrt{5}}\) and that \(\lambda_2 = -10\) is the smallest eigenvalue with associated eigenvector \(\uvec_2 = \twovec{2/\sqrt{5}}{1/\sqrt{5}}\text{.}\)

Let’s first study the quadratic form \(q_D(\uvec) = 5u_1^2 - 10u_2^2\) because the absence of the cross-term makes it comparatively simple. Remembering that \(\uvec\) is a unit vector, we have \(u_1^2+u_2^2=1\text{,}\) which means that \(u_1^2 = 1-u_2^2\text{.}\) Therefore,

\begin{equation*} q_D(\uvec) = 5u_1^2 - 10u_2^2 = 5(1-u_2^2)-10u_2^2 = 5 - 15u_2^2\text{.} \end{equation*}

This tells us that \(q_D(\uvec)\) has a maximum value of \(5\text{,}\) which occurs when \(u_2=0\) or in the direction \(\twovec10\text{.}\)

In the same way, rewriting \(u_2^2 = 1-u_1^2\) allows us to conclude that the minimum value of \(q_D(\uvec)\) is \(-10\text{,}\) which occurs in the direction \(\twovec01\text{.}\)

Let’s now return to the matrix \(A\) whose quadratic form \(q_A\) is related to \(q_D\) because \(A = QDQ^{\transpose}\text{.}\) In particular, we have

\begin{equation*} q_A(\uvec) = \uvec\cdot(A\uvec) = \uvec\cdot(QDQ^{\transpose}\uvec) = (Q^{\transpose}\uvec)\cdot(DQ^{\transpose}\uvec) = \vvec\cdot(D\vvec) = q_D(\vvec)\text{.} \end{equation*}

In other words, we have \(q_A(\uvec) = q_D(\vvec)\) where \(\vvec=Q^{\transpose}\uvec\text{.}\) This is quite useful because it allows us to relate the values of \(q_A\) to those of \(q_D\text{,}\) which we already understand quite well.

Now it turns out that \(\vvec\) is also a unit vector because

\begin{equation*} |\vvec|^2 = \vvec\cdot\vvec = (Q^{\transpose}\uvec)\cdot(Q^{\transpose}\uvec) = \uvec\cdot(QQ^{\transpose}\uvec) = \uvec\cdot\uvec = |\uvec|^2 = 1\text. \end{equation*}

Therefore, the maximum value of \(q_A(\uvec)\) is the same as \(q_D(\vvec)\text{,}\) which we know to be \(5\) and which occurs in the direction \(\vvec=\twovec10\text{.}\) This means that the maximum value of \(q_A(\uvec)\) is also \(5\) and that this occurs in the direction \(\uvec = Q\vvec = Q\twovec10 = \twovec{1/\sqrt{5}}{-2/\sqrt{5}}\text{.}\) We now know that the maximum value of \(q_A(\uvec)\) is the largest eigenvalue \(\lambda_1=5\) and that this maximum value occurs in the direction of an associated eigenvector.

In the same way, we see that the minimum value of \(q_A(\uvec)\) is the smallest eigenvalue \(\lambda_2=-10\) and that this minimum occurs in the direction of \(\uvec=Q\twovec01 = \twovec{2/\sqrt{5}}{1/\sqrt{5}}\text{,}\) an associated eigenvector.

More generally, we have

Proposition 7.3.5.

Suppose that \(A\) is a symmetric matrix, that we list its eigenvalues in decreasing order \(\lambda_1 \geq \lambda_2 \ldots \geq \lambda_m\text{,}\) and that \(\uvec_1,\uvec_2,\ldots,\uvec_m\) is a basis of associated eigenvectors. The maximum value of \(q_A(\uvec)\) among all unit vectors \(\uvec\) is \(\lambda_1\text{,}\) which occurs in the direction \(\uvec_1\text{.}\) Similarly, the minimum value of \(q_A(\uvec)\) is \(\lambda_m\text{,}\) which occurs in the direction \(\uvec_m\text{.}\)

Example 7.3.6.

Suppose that \(A\) is the symmetric matrix \(A=\begin{bmatrix} 0 \amp 6 \amp 3 \\ 6 \amp 3 \amp 6 \\ 0 \amp 6 \amp 6 \\ \end{bmatrix}\text{,}\) which may be orthogonally diagonalized as \(A=QDQ^{\transpose}\) where

\begin{equation*} D = \begin{bmatrix} 12 \amp 0 \amp 0 \\ 0 \amp 3 \amp 0 \\ 0 \amp 0 \amp -6 \\ \end{bmatrix}, \hspace{24pt} Q = \begin{bmatrix} 1/3 \amp 2/3 \amp 2/3 \\ 2/3 \amp 1/3 \amp -2/3 \\ 2/3 \amp -2/3 \amp 1/3 \\ \end{bmatrix}\text{.} \end{equation*}

We see that the maximum value of \(q_A(\uvec)\) is 12, which occurs in the direction \(\threevec{1/3}{2/3}{2/3}\text{,}\) and the minimum value is -6, which occurs in the direction \(\threevec{2/3}{-2/3}{1/3}\text{.}\)

Example 7.3.7.

Suppose we have the demeaned column-variate data matrix

\begin{equation*} X = \begin{bmatrix} 2 \amp 1 \\ 1 \amp 2 \\ -3 \amp -3 \end{bmatrix} \end{equation*}

that we considered in Preview Activity 7.3.1. The demeaned case vectors are plotted as points in Figure 7.3.8.

Figure 7.3.8. The set of demeaned case vectors from Preview Activity 7.3.1.

Constructing the covariance matrix \(S=\frac12~X^{\transpose}X\) gives \(S=\begin{bmatrix} 14/2 \amp 13/2 \\ 13/2 \amp 14/2 \end{bmatrix}\text{,}\) which has eigenvalues \(\lambda_1 = 27/2\text{,}\) with associated eigenvector \(\twovec{1/\sqrt{2}}{1/\sqrt{2}}\text{,}\) and \(\lambda_2=1/2\text{,}\) with associated eigenvector \(\twovec{-1/\sqrt{2}}{1/\sqrt{2}}\text{.}\)

Remember that the variance in a direction \(\uvec\) is \(V_{\uvec} = \uvec\cdot(S\uvec) = q_S(\uvec)\text{.}\) Therefore, the variance attains a maximum value of 27/2 in the direction \(\twovec{1/\sqrt{2}}{1/\sqrt{2}}\) and a minimum value of 1/2 in the direction \(\twovec{-1/\sqrt{2}}{1/\sqrt{2}}\text{.}\) Figure 7.3.9 shows the data projected onto the lines defined by these vectors.

Figure 7.3.9. The demeaned data from Preview Activity 7.3.1 is shown projected onto the lines of maximal and minimal variance.

Remember that variance is additive, as stated in Proposition 7.1.10, which tells us that the total variance is \(V = 27/2 + 1/2 = 14\text{.}\)

We’ve been focused on finding the directions in which a quadratic form attains its maximum and minimum values, but there’s another important observation to make after this activity. Recall how we used the fact that a symmetric matrix is orthogonally diagonalizable: if \(A=QDQ^{\transpose}\text{,}\) then \(q_A(\uvec) = q_D(\vvec)\) where \(\vvec = Q^{\transpose}\uvec\text{.}\)

More generally, if we define \(\yvec = Q^{\transpose}\xvec\text{,}\) we have

\begin{equation*} q_A(\xvec) = \xvec\cdot(A\xvec) = \xvec\cdot(QDQ^{\transpose}\xvec) = (Q^{\transpose}\xvec)\cdot(DQ^{\transpose}\xvec) = \yvec\cdot(D\yvec) = q_D(\yvec) \end{equation*}

Remembering that the quadratic form associated to a diagonal form has no cross terms, we obtain

\begin{equation*} q_A(\xvec) = q_D(\yvec) = \lambda_1y_1^2 + \lambda_2y_2^2 + \ldots + \lambda_my_m^2. \end{equation*}

In other words, after a change of coordinates, the quadratic form \(q_A\) can be written without cross terms. This is known as the Principle Axes Theorem.

Theorem 7.3.10. Principle Axes Theorem.

If \(A\) is a symmetric \(m\by m\) matrix with eigenvalues \(\lambda_1,\lambda_2,\ldots,\lambda_m\text{,}\) then the quadratic form \(q_A\) can be written, after an orthogonal change of coordinates \(\yvec=Q^{\transpose}\xvec\text{,}\) as

\begin{equation*} q_A(\xvec) = \lambda_1y_1^2 + \lambda_2y_2^2 + \ldots + \lambda_my_m^2. \end{equation*}

We will put this to use in the next section.

Subsection 7.3.2 Definite symmetric matrices

While our questions about variance provide some motivation for exploring quadratic forms, these functions appear in a variety of other contexts so it’s worth spending some more time with them. For example, quadratic forms appear in multivariable calculus when describing the behavior of a function of several variables near a critical point and in physics when describing the kinetic energy of a rigid body.

The following definition will be important in this section.

Definition 7.3.11.

A symmetric matrix \(A\) is called positive definite if its associated quadratic form satisfies \(q_A(\xvec) \gt 0\) for any nonzero vector \(\xvec\text{.}\) If \(q_A(\xvec) \geq 0\) for all nonzero vectors \(\xvec\text{,}\) we say that \(A\) is positive semidefinite.

Likewise, we say that \(A\) is negative definite if \(q_A(\xvec) \lt 0\) for all nonzero vectors \(\xvec\text{.}\)

Finally, \(A\) is called indefinite if \(q_A(\xvec) \gt 0\) for some \(\xvec\) and \(q_A(\xvec) \lt 0\) for others.

Activity 7.3.4.

This activity explores the relationship between the eigenvalues of a symmetric matrix and its definiteness.

Consider the diagonal matrix \(D=\begin{bmatrix} 4 \amp 0 \\ 0 \amp 2 \\ \end{bmatrix}\) and write its quadratic form \(q_D(\xvec)\) in terms of the components of \(\xvec=\twovec{x_1}{x_2}\text{.}\) How does this help you decide whether \(D\) is positive definite or not?
Now consider \(D=\begin{bmatrix} 4 \amp 0 \\ 0 \amp 0 \\ \end{bmatrix}\) and write its quadratic form \(q_D(\xvec)\) in terms of \(x_1\) and \(x_2\text{.}\) What can you say about the definiteness of \(D\text{?}\)
If \(D\) is a diagonal matrix, what condition on the diagonal entries guarantee that \(D\) is
1. positive definite?
2. positive semidefinite?
3. negative definite?
4. negative semidefinite?
5. indefinite?
Suppose that \(A\) is a symmetric matrix with eigenvalues 4 and 2 so that \(A=QDQ^{\transpose}\) where \(D=\begin{bmatrix}4 \amp 0 \\ 0 \amp 2 \end{bmatrix}\text{.}\) If \(\yvec = Q^{\transpose}\xvec\text{,}\) then we have \(q_A(\xvec) = q_D(\yvec)\text{.}\) Explain why this tells us that \(A\) is positive definite.
Suppose that \(A\) is a symmetric matrix with eigenvalues 4 and 0. What can you say about the definiteness of \(A\) in this case?
What condition on the eigenvalues of a symmetric matrix \(A\) guarantees that \(A\) is
1. positive definite?
2. positive semidefinite?
3. negative definite?
4. negative semidefinite?
5. indefinite?

Answer.

\(q_D(\xvec) = 4x_1^2 + 2x_2^2\) so \(D\) is positive definite.
\(q_D(\xvec) = 4x_1^2\) so \(D\) is positive semidefinite.
1. They are all positive.
2. They are all nonnegative.
3. They are all negative.
4. They are all nonpositive.
5. They are some positive eigenvalues and some negative ones .
\(A\) is positive definite.
\(A\) is positive semidefinite.
They will be the same as before.

Solution.

\(q_D(\xvec) = 4x_1^2 + 2x_2^2\text{.}\) Both addends are nonnegative, and one of them is positive if \(\xvec\) is nonzero. This means that \(q_D(\xvec) \gt 0\) when \(\xvec\) is nonzero and so \(D\) is positive definite.
\(q_D(\xvec) = 4x_1^2\text{,}\) which is always nonnegative. However, \(q_D\left(\twovec01\right) = 0\) so \(D\) is positive semidefinite.
1. They are all positive.
2. They are all nonnegative.
3. They are all negative.
4. They are all nonpositive.
5. They are some positive eigenvalues and some negative ones .
Since we know that \(q_D(\yvec) \gt 0\) when \(\yvec\) is nonzero, we know that \(q_A(\xvec) \gt 0\) when \(\xvec\) is nonzero. Therefore, \(A\) is positive definite.
It will be positive semidefinite.
They will be the same as before.

As seen in this activity, it is straightforward to determine the definiteness of a diagonal matrix. For instance, if \(D=\begin{bmatrix} 7 \amp 0 \\ 0 \amp 5 \end{bmatrix}\text{,}\) then

\begin{equation*} q_D(\xvec) = 7x_1^2 + 5x_2^2. \end{equation*}

This shows that \(q_D(\xvec) \gt 0\) when either \(x_1\) or \(x_2\) is not zero so we conclude that \(D\) is positive definite. In the same way, we see that \(D\) is positive semidefinite if all the diagonal entries are nonnegative.

Understanding this behavior for diagonal matrices enables us to understand more general symmetric matrices. As we saw previously, the quadratic form for a symmetric matrix \(A=QDQ^{\transpose}\) agrees with the quadratic form for the diagonal matrix \(D\) after a change of coordinates. In particular,

\begin{equation*} q_A(\xvec) = q_D(\yvec) \end{equation*}

where \(\yvec=Q^{\transpose}\xvec\text{.}\) Now the diagonal entries of \(D\) are the eigenvalues of \(A\) from which we conclude that \(q_A(\xvec) \gt 0\) if all the eigenvalues of \(A\) are positive. Likewise, \(q_A(\xvec)\geq 0\) if all the eigenvalues are nonnegative.

Proposition 7.3.12.

A symmetric matrix is positive definite if all its eigenvalues are positive. It is positive semidefinite if all its eigenvalues are nonnegative.

Likewise, a symmetric matrix is indefinite if some eigenvalues are positive and some are negative.

We will now apply what we’ve learned about quadratic forms to study the nature of critical points in multivariable calculus. The rest of this section assumes that the reader is familiar with ideas from multivariable calculus and can be skipped by others.

First, suppose that \(f(x,y)\) is a differentiable function. We will use \(f_x\) and \(f_y\) to denote the partial derivatives of \(f\) with respect to \(x\) and \(y\text{.}\) Similarly, \(f_{xx}\text{,}\) \(f_{xy}\text{,}\) \(f_{yx}\) and \(f_{yy}\) denote the second partial derivatives. You may recall that the mixed partials, \(f_{xy}\) and \(f_{yx}\) are equal under a mild assumption on the function \(f\text{.}\) A typical question in calculus is to determine where this function has its maximum and minimum values.

Any local maximum or minimum of \(f\) appears at a critical point \((x_0,y_0)\) where

\begin{equation*} f_x(x_0,y_0) = 0,\hspace{24pt} f_y(x_0,y_0) = 0. \end{equation*}

Near a critical point, the quadratic approximation of \(f\) tells us that

\begin{align*} f(x,y)\approx f(x_0,y_0) \amp + \frac12 f_{xx}(x_0,y_0)(x-x_0)^2\\ \amp + f_{xy}(x_0,y_0)(x-x_0)(y-y_0) + \frac12 f_{yy}(x_0,y_0)(y-y_0)^2. \end{align*}

Activity 7.3.5.

Let’s explore how our understanding of quadratic forms helps us determine the behavior of a function \(f\) near a critical point.

Consider the function \(f(x,y) = 2x^3 - 6xy + 3y^2\text{.}\) Find the partial derivatives \(f_{x}\) and \(f_y\) and use these expressions to determine the critical points of \(f\text{.}\)
Evaluate the second partial derivatives \(f_{xx}\text{,}\) \(f_{xy}\text{,}\) and \(f_{yy}\text{.}\)
Let’s first consider the critical point \((1,1)\text{.}\) Use the quadratic approximation as written above to find an expression approximating \(f\) near the critical point.
Using the vector \(\wvec = \twovec{x-1}{y-1}\text{,}\) rewrite your approximation as

\begin{equation*} f(x,y) \approx f(1,1) + q_A(\wvec) \end{equation*}

for some matrix \(A\text{.}\) What is the matrix \(A\) in this case?
Find the eigenvalues of \(A\text{.}\) What can you conclude about the definiteness of \(A\text{?}\)
Recall that \((x_0,y_0)\) is a local minimum for \(f\) if \(f(x,y) \gt f(x_0,y_0)\) for nearby points \((x,y)\text{.}\) Explain why our understanding of the eigenvalues of \(A\) shows that \((1,1)\) is a local minimum for \(f\text{.}\)

Answer.

We have

\begin{align*} f_x \amp {}={} 6x^2 - 6y = 0\\ f_y \amp {}={} -6x + 6y = 0 \end{align*}

with critical points \((0,0)\) and \((1,1)\text{.}\)
We have

\begin{equation*} f_{xx} = 12x, \hspace{24pt} f_{xy} = -6, \hspace{24pt} f_{yy} = 6 \end{equation*}
This gives

\begin{equation*} f(x,y)\approx -1 + \frac12 12(x-1)^2 -6(x-1)(y-1) + \frac12 6(y-1)^2 \end{equation*}
\(A = \begin{bmatrix} 6 \amp -3 \\ -3 \amp 3 \end{bmatrix} \text{.}\)
\(A\) is positive definite.
\(f(x,y) \approx f(1,1) + q_A(\wvec) \gt f(1,1)\) for points \((x,y)\) near to \((1,1)\text{.}\)

Solution.

We have

\begin{align*} f_x \amp {}={} 6x^2 - 6y = 0\\ f_y \amp {}={} -6x + 6y = 0 \end{align*}

which leads to the conditions \(y=x^2\) and \(y=x\text{.}\) This gives the critical points \((0,0)\) and \((1,1)\text{.}\)
We have

\begin{equation*} f_{xx} = 12x, \hspace{24pt} f_{xy} = -6, \hspace{24pt} f_{yy} = 6 \end{equation*}
This gives

\begin{equation*} f(x,y)\approx -1 + \frac12 12(x-1)^2 -6(x-1)(y-1) + \frac12 6(y-1)^2 \end{equation*}
The matrix is \(A = \begin{bmatrix} 6 \amp -3 \\ -3 \amp 3 \end{bmatrix} \text{.}\)
The eigenvalues are \(\lambda_1=7.85\) and \(\lambda_2=1.15\text{,}\) both of which are positive, which means that \(A\) is positive definite.
We have \(q_A(\wvec) \gt 0\) if \(\wvec\) is nonzero so

\begin{equation*} f(x,y) \approx f(1,1) + q_A(\wvec) \gt f(1,1) \end{equation*}

for points \((x,y)\) near to \((1,1)\text{.}\)

Near a critical point \((x_0,y_0)\) of a function \(f(x,y)\text{,}\) we can write

\begin{equation*} f(x,y) \approx f(x_0, y_0) + q_A(\wvec) \end{equation*}

where \(\wvec = \twovec{x-x_0}{y-y_0}\) and \(A = \frac12 \begin{bmatrix} f_{xx}(x_0,y_0) \amp f_{xy}(x_0,y_0) \\ f_{yx}(x_0,y_0) \amp f_{yy}(x_0,y_0) \end{bmatrix}\text{.}\) If \(A\) is positive definite, then \(q_A(\wvec) \gt 0\text{,}\) which tells us that

\begin{equation*} f(x,y) \approx f(x_0,y_0) + q_A(\wvec) \gt f(x_0,y_0) \end{equation*}

and that the critical point \((x_0,y_0)\) is therefore a local minimum.

The matrix

\begin{equation*} H = \begin{bmatrix} f_{xx}(x_0,y_0) \amp f_{xy}(x_0,y_0) \\ f_{yx}(x_0,y_0) \amp f_{yy}(x_0,y_0) \end{bmatrix} \end{equation*}

is called the Hessian of \(f\text{,}\) and we see now that the eigenvalues of this symmetric matrix determine the nature of the critical point \((x_0,y_0)\text{.}\) In particular, if the eigenvalues are both positive, then \(q_H\) is positive definite, and the critical point is a local minimum.

This observation leads to the Second Derivative Test for multivariable functions.

Proposition 7.3.13. Second Derivative Test.

The nature of a critical point of a multivariable function is determined by the Hessian \(H\) of the function at the critical point. If

\(H\) has all positive eigenvalues, the critical point is a local minimum.
\(H\) has all negative eigenvalues, the critical point is a local maximum.
\(H\) has both positive and negative eigenvalues, the critical point is neither a local maximum nor minimum.

Most multivariable calculus texts assume that the reader is not familiar with linear algebra and so write the second derivative test for functions of two variables in terms of \(D=\det(H)\text{.}\) If

\(D \gt 0\) and \(f_{xx}(x_0,y_0)) \gt 0\text{,}\) then \((x_0, y_0)\) is a local minimum.
\(D \gt 0\) and \(f_{xx}(x_0,y_0)) \lt 0\text{,}\) then \((x_0, y_0)\) is a local maximum.
\(D \lt 0\text{,}\) then \((x_0,y_0)\) is neither a local maximum nor minimum.

The conditions in this version of the second derivative test are simply algebraic criteria that tell us about the definiteness of the Hessian matrix \(H\text{.}\)

Subsection 7.3.3 Summary

This section explored quadratic forms, functions that are defined by symmetric matrices.

If \(A\) is a symmetric matrix, then the quadratic form defined by \(A\) is the function \(q_A(\xvec) = \xvec\cdot(A\xvec)\text{.}\)

Quadratic forms appear when studying the variance of a dataset. If \(S\) is the covariance matrix, then the variance in the direction defined by a unit vector \(\uvec\) is \(q_S(\uvec) = \uvec\cdot(S\uvec)=V_{\uvec}\text{.}\)

Similarly, quadratic forms appear in multivariable calculus when analyzing the behavior of a function of several variables near a critical point.
If \(\lambda_1\) is the largest eigenvalue of a symmetric matrix \(A\) and \(\lambda_m\) the smallest, then the maximum value of \(q_A(\uvec)\) among unit vectors \(\uvec\text{,}\) is \(\lambda_1\text{,}\) and this maximum value occurs in the direction of \(\uvec_1\text{,}\) a unit eigenvector associated to \(\lambda_1\text{.}\)

Similarly, the minimum value of \(q_A(\uvec)\) is \(\lambda_m\text{,}\) which appears in the direction of \(\uvec_m\text{,}\) an eigenvector associated to \(\lambda_m\text{.}\)
A symmetric matrix is positive definite if its eigenvalues are all positive, positive semidefinite if its eigenvalues are all nonnegative, and indefinite if it has both positive and negative eigenvalues.
If the Hessian \(H\) of a multivariable function \(f\) is positive definite at a critical point, then the critical point is a local minimum. Likewise, if the Hessian is negative definite, the critical point is a local maximum.

Exercises 7.3.4 Exercises

1.

Suppose that \(A = \begin{bmatrix} 4 \amp 2 \\ 2 \amp 7 \end{bmatrix}\text{.}\)

Find an orthogonal diagonalization of \(A\text{.}\)
Evaluate the quadratic form \(q_A\left(\twovec11\right)\text{.}\)
Find the unit vector \(\uvec\) for which \(q_A(\uvec)\) is as large as possible. What is the value of \(q_A(\uvec)\) in this direction?
Find the unit vector \(\uvec\) for which \(q_A(\uvec)\) is as small as possible. What is the value of \(q_A(\uvec)\) in this direction?

2.

Consider the quadratic form

\begin{equation*} q\left(\twovec{x_1}{x_2}\right) = 3x_1^2 - 4x_1x_2 + 6x_2^2. \end{equation*}

Find a matrix \(A\) such that \(q(\xvec) = \xvec^{\transpose}A\xvec\text{.}\)
Find the maximum and minimum values of \(q(\uvec)\) among all unit vectors \(\uvec\) and describe the directions in which they occur.

3.

Suppose that \(X\) is a demeaned column-variate data matrix:

\begin{equation*} X= \begin{bmatrix} 1 \amp 1 \\ -2 \amp -1 \\ 0 \amp -1 \\ 1 \amp 1 \\ \end{bmatrix}\text{.} \end{equation*}

Find the covariance matrix \(S_{XX}\text{.}\)
What is the variance of the data projected onto the line defined by \(\uvec=\twovec{1/\sqrt{2}}{1/\sqrt{2}}\text{.}\)
What is the total variance?
In which direction is the variance greatest and what is the variance in this direction?

4.

Consider the matrix \(A = \begin{bmatrix} 4 \amp -3 \amp -3 \\ -3 \amp 4 \amp -3 \\ -3 \amp -3 \amp 4 \\ \end{bmatrix} \text{.}\)

Find \(Q\) and \(D\) such that \(A=QDQ^{\transpose}\text{.}\)
Find the maximum and minimum values of \(q(\uvec) = \xvec^{\transpose}A\xvec\) among all unit vectors \(\uvec\text{.}\)
Describe the direction in which the minimum value occurs. What can you say about the direction in which the maximum occurs?

5.

Consider the matrix \(B = \begin{bmatrix} -2 \amp 1 \\ 4 \amp -2 \\ 2 \amp -1 \\ \end{bmatrix}\text{.}\)

Find the matrix \(A\) so that \(q\left(\twovec{x_1}{x_2}\right) = \len{B\xvec}^2=q_A(\xvec)\text{.}\)
Find the maximum and minimum values of \(q(\uvec)\) among all unit vectors \(\uvec\) and describe the directions in which they occur.
What does the minimum value of \(q(\uvec)\) tell you about the matrix \(B\text{?}\)

6.

Consider the quadratic form

\begin{equation*} q\left(\threevec{x_1}{x_2}{x_3}\right) = 7x_1^2 + 4x_2^2 + 7x_3^2 - 2x_1x_2 -4x_1x_3-2x_2x_3. \end{equation*}

What can you say about the definiteness of the matrix \(A\) that defines the quadratic form?
Find a matrix \(Q\) so that the change of coordinates \(\yvec = Q^{\transpose}\xvec\) transforms the quadratic form into one that has no cross terms. Write the quadratic form in terms of \(\yvec\text{.}\)
What are the maximum and minimum values for \(q(\uvec)\) among all unit vectors \(\uvec\text{?}\)

7.

Explain why the following statements are true.

Given any matrix \(B\text{,}\) the matrix \(B^{\transpose}B\) is a symmetric, positive semidefinite matrix.
If both \(A\) and \(B\) are symmetric, positive definite matrices, explain why \(A+B\) is a symmetric, positive definite matrix.
If \(A\) is a symmetric, invertible, positive definite matrix, then \(A^{-1}\) is also.

8.

Determine whether the following statements are true or false and explain your reasoning.

If \(A\) is an indefinite matrix, we can’t know whether it is positive definite or not.
If the smallest eigenvalue of \(A\) is 3, then \(A\) is positive definite.
If \(C\) is the covariance matrix associated with a data set, then \(C\) is positive semidefinite.
If \(A\) is a symmetric \(2\by2\) matrix and the maximum and minimum values of \(q_A(\uvec)\) occur at \(\twovec10\) and \(\twovec01\text{,}\) then \(A\) is diagonal.
If \(A\) is negative definite and \(Q\) is an orthogonal matrix with \(B = QAQ^{\transpose}\text{,}\) then \(B\) is negative definite.

9.

Determine the critical points for each of the following functions. At each critical point, determine the Hessian \(H\text{,}\) describe the definiteness of \(H\text{,}\) and determine whether the critical point is a local maximum or minimum.

\(f(x,y) = xy + \frac2x + \frac2y\text{.}\)
\(f(x,y) = x^4 + y^4 - 4xy\text{.}\)

10.

Consider the function \(f(x,y,z) = x^4 + y^4 +z^4 - 4xyz\text{.}\)

Show that \(f\) has a critical point at \((-1,1,-1)\) and construct the Hessian \(H\) at that point.
Find the eigenvalues of \(H\text{.}\) Is this a definite matrix of some kind?
What does this imply about whether \((-1,1,-1)\) is a local maximum or minimum?

Understanding Linear Algebra: Data Science Edition

Search Results:

Section 7.3 Quadratic forms

Preview Activity 7.3.1.

Subsection 7.3.1 Quadratic forms

Example 7.3.2.

Definition 7.3.3.

Activity 7.3.2.

Activity 7.3.3.

Example 7.3.4.

Proposition 7.3.5.

Example 7.3.6.

Example 7.3.7.

Theorem 7.3.10. Principle Axes Theorem.

Subsection 7.3.2 Definite symmetric matrices

Definition 7.3.11.

Activity 7.3.4.

Proposition 7.3.12.

Activity 7.3.5.

Proposition 7.3.13. Second Derivative Test.

Subsection 7.3.3 Summary

Exercises 7.3.4 Exercises

1.

2.

3.

4.

5.

6.

7.

8.

9.

10.