Skip to main content

Section 6.1 The dot product

Recall that our goal for the chapter is to find an approximate solution to \(A\xvec = \bvec\text{.}\) We will do this by solving \(A \xvec = \bhat\text{,}\) for a \(\bhat\) that is a close as possible to \(\bvec\text{.}\) We will start with a simple case: If \(\col(A)\) is spanned by a single vector, then geometrically, finding such a \(\bhat\) is the same as finding a linear projction.
Along the way, we will introduce a new operation on vectors, known as the dot product. Geometrically, the dot product is related to the lengths of vectors and the angles between pairs of vectors. Importantly, the dot product is also easily computed from the list-of-numbers representation of vectors. It also has many nice algebraic properties. So this is another situation where each of our three presepectives of vectors provides valuable insight.

Subsection 6.1.1 Projections and dot products

We will motivate the defintion of the dot product by investigating the projection of a vector \(\bvec\) in the direction of a vector \(\vvec\text{,}\) which we will denote as \(\proj{\bvec}{\vvec}\text{.}\) The projection vector \(\proj{\bvec}{\vvec}\) is the scalar multiple \(k \vvec\) of \(\vvec\) that is as close as possible to \(\bvec\text{.}\) That means to find the projection vector, we must find the scalar \(k\) that makes the vector \(\bvec - k \vvec\) as short as possible. As is illustrated in Figure 6.1.1, this happens when \(k \vvec\) and \((\bvec - k \vvec)\) form a 90 degree angle.
Figure 6.1.1. Illustrating the projection \(\proj{\bvec}{\vvec}\text{.}\) We are looking for a vector \(k \vvec\) along the blue line that is as close as possible to \(\bvec\text{.}\) This will occur (upper right) when \(k \vvec\) and \(\bvec - k \vvec\) form a right angle. If \(k \vvec\) is shorter (lower left) or longer (lower right), then the angle will not be a right angle, and so a closer vector exists (because the legs of a right triangle are shorter than the hypotenuse).
By the definition of cosine, this means that
\begin{equation*} \frac{k \len{\bvec}}{\len{\vvec}} = \cos \theta \end{equation*}
where \(\len{\vvec}\) denotes the (Euclidean) length of the vector \(\vvec\) and \(\theta\) is the angle between the two vectors. Solving for \(k\text{,}\) we find that
\begin{equation*} k = \frac{\len{\bvec} \cos \theta}{\len{\vvec}}\text{.} \end{equation*}
So
\begin{equation*} \proj{\bvec}{\vvec} = \frac{\len{\bvec} \cos \theta}{\len{\vvec}} \vvec\text{.} \end{equation*}
Notice that
\begin{equation*} \proj{\bvec}{\vvec} = \frac{\len{\bvec} \cos \theta}{\len{\vvec}} \vvec = \frac{\len{\vvec}\len{\bvec} \cos \theta}{\len{\vvec}^2} \vvec \text{.} \end{equation*}
This leads to our geometric definition of the dot product, which is similar to the peojection, but is (a) symmetric in \(\bvec\) and \(\vvec\text{,}\) (b) avoids the denominator, and (c) results in a scalar rather than a vector. The result will be easier to work with algebraically and also easier to compute.
Figure 6.1.2. The dot product \(\xvec\cdot\yvec\) measures the angle \(\theta\text{.}\)

Definition 6.1.3.

For any two non-zero vectors \(\xvec\) and \(\yvec\) in \(\real^n\) with an angle \(\theta\) between them, we define their dot product, denoted \(\xvec \cdot \yvec\text{,}\) as
\begin{equation*} \xvec \cdot \yvec = \len{\xvec} \len{\yvec} \cos \theta\text{.} \end{equation*}
If either \(\xvec\) or \(\yvec\) is the zero vector, then \(\xvec \cdot \yvec = 0\text{.}\)

Note 6.1.4.

Note that the dot product of two vectors is a scalar. For this reason, the dot product is sometimes called the scalar product.
We can easily re-express the projection in terms of the dot product.
The formulas for the dot product and for projection become simpler for unit vectors, vectors with length 1. If \(\uvec\) and \(\vvec\) are unit vectors in \(\real^n\) and \(\xvec\) and \(\yvec\) are any vectors in \(\real^n\text{,}\) then
\begin{equation*} \uvec \cdot \vvec = \len{\uvec} \len{\vvec} \cos \theta = \cos \theta\text{,} \end{equation*}
and
\begin{equation*} \proj{\xvec}{\uvec} = \frac{\xvec \cdot \uvec}{\len{\uvec}^2} \uvec = (\xvec \cdot \uvec) \uvec\text{.} \end{equation*}
And since \(\frac{\xvec}{\len{\xvec}}\) is a unit vector, this gives us another way to think about \(\proj{\yvec}{\xvec}\text{,}\) namely,
\begin{equation*} \proj{\yvec}{\xvec} = \proj{\yvec}{\frac{\xvec}{\len{\xvec}}} = \underbrace{\left(\yvec \cdot \frac{\xvec}{\len{\xvec}}\right)}_{\mathrm{length}} \underbrace{\frac{\xvec}{\len{\xvec}} }_{\mathrm{direction}}\text{.} \end{equation*}
In other words, we obtain the length of the projection by taking the dot product of \(\yvec\) with a unit vector in the same direction as \(\xvec\text{.}\)
A number of properties of the dot product follow from the definition.
The first two statements follow directly from the definition.
The third statement follows by observing that \(\cos \theta = \cos 0 = 1\) for two vectors that point in the same direction.
The fourth property deserves some explanation. First, we observe that a similar property holds for projections, namely,
\begin{equation*} \proj{(\yvec + \zvec)}{\xvec}) = \proj{\yvec}{\xvec} + \proj{\zvec}{\xvec}\text{.} \end{equation*}
This is illustrated in Figure 6.1.8. The orange vector is the sum of the red and blue vectors, and the projections onto one of the axes is indicated. You can drag the background to get a different perspective and to see that this holds even if \(\avec\text{,}\) \(\bvec\text{,}\) and the axis of projection do not lie in a plane. Projecting onto an axis makes the visualization a little bit simpler, but changing the perspective demonstrates that this is not essential. It will always be the case that the projection of a sum is the sum of the projections.
The final property follows by (repeated) application of scalar multiples and distributivity.
Figure 6.1.8. The projection of a sum is the sum of projections. Drag the arrow heads to change the vectors. Drag on the background change the perspective. Projections are illustrated with thinner, lighter lines of the same color as the vectors being projected. Each of the red, blue, and orange triangles is a right triangle. The blue projection vector is indicated twice, once as part of the blue triangle, and a second time translated so that it begins where the red projection vector ends.
From this it follows that
\begin{align*} \frac{(\yvec + \zvec)\cdot \xvec}{\len{\xvec}^2} \xvec \amp = \frac{\yvec \cdot \xvec}{\len{\xvec}^2} \xvec + \frac{\zvec \cdot \xvec}{\len{\xvec}^2} \xvec \\ \left( (\yvec + \zvec) \cdot \xvec \right) \frac{\xvec}{\len{\xvec}^2} \amp = \left( \yvec \cdot \xvec + \zvec \cdot \xvec \right) \frac{\xvec}{\len{\xvec}^2}\\ (\yvec + \zvec) \cdot \xvec \amp = \yvec \cdot \xvec + \zvec \cdot \xvec \end{align*}
These properties of the dot product allow us to algebraically rearrange or simplify many expressions involving dot products.

Example 6.1.9.

Suppose that \(\vvec_1\cdot\wvec = 4\) and \(\vvec_2\cdot\wvec = -7\text{.}\) Then
\begin{equation*} \begin{aligned} (2\vvec_1)\cdot\wvec \amp {}={} 2(\vvec_1\cdot\wvec) = 2(4) = 8 \\ (-3\vvec_1+ 2\vvec_2)\cdot\wvec \amp {}={} -3(\vvec_1\cdot\wvec) + 2(\vvec_2\cdot\wvec) = -3(4)+2(-7) = -26 \end{aligned}\text{.} \end{equation*}
As we move forward, it will be important for us to recognize when vectors are perpendicular to one another. For instance, when vectors \(\vvec\) and \(\wvec\) are perpendicular, the angle between them \(\theta=90^\circ\) and we have
\begin{equation*} \vvec\cdot\wvec=\len{\vvec}\len{\wvec}\cos\theta = \len{\vvec}\len{\wvec}\cos90^\circ = 0\text{.} \end{equation*}
Therefore, the dot product between perpendicular vectors must be zero. This leads to the following definition.

Definition 6.1.10.

We say that vectors \(\vvec\) and \(\wvec\) are orthogonal if \(\vvec\cdot\wvec=0\text{.}\) We can denote this with \(\vvec \perp \wvec\text{.}\)
In practical terms, two perpendicular vectors are orthogonal. However, the concept of orthogonality is somewhat more general because it allows one or both of the vectors to be the zero vector \(\zerovec\text{.}\)
We turn next to the important question of how to compute dot products.

Subsection 6.1.2 Computing dot products

The geometric definition of the dot product motivates many of its applications, but it can be difficult to compute the dot product from this definition because \(\cos \theta\) may be difficult to obtain, espcially in high dimensions. Fortunately, there is a computational shortcut that follows directly from Proposition 6.1.7.
Let
\begin{equation*} \vvec = \left[ \begin{array}{c} v_1 \\ v_2 \\ \vdots \\ v_n \\ \end{array} \right] \text{ and } \wvec = \left[ \begin{array}{c} w_1 \\ w_2 \\ \vdots \\ w_n \\ \end{array} \right] \end{equation*}
be two vectors in \(\real^n\text{.}\) Then we can write
\begin{align*} \vvec \amp = v_1 \evec_1 + v_2 \evec_2 + \cdots + v_n \evec_n\\ \wvec \amp = w_1 \evec_1 + w_2 \evec_2 + \cdots + w_n \evec_n\text{.} \end{align*}
From this it follows that
\begin{align*} \vvec \cdot \wvec \amp = (v_1 \evec_1 + v_2 \evec_2 + \cdots + v_n \evec_n) \cdot (w_1 \evec_1 + w_2 \evec_2 + \cdots + w_n \evec_n) \\ \amp = \sum_{i=1}^{n} \sum_{j=1}^{n} v_i w_j \evec_i \cdot \evec_j \\ \amp = \sum_{i = j} v_i w_i 1 + \sum_{i \neq j} v_i w_i 0\\ \amp = \sum_{i=1}^{n} v_i w_i \end{align*}
because
\begin{equation*} \evec_i \cdot \evec_j = \begin{cases} 0 \amp i \neq j \\ 1 \amp i = j \\ \end{cases} \;\;\text{,} \end{equation*}
so the only terms in the sum that are not 0 are the ones where \(i\) and \(j\) match.
In other words, the dot product can be computed as the sum of products of corresponding entries in the two vectors. It will be useful, whenever you see a sum of products to ask how it might be interpreted as a dot product.

Note 6.1.12.

\begin{equation*} \vvec\cdot\wvec = \left[ \begin{array}{c} v_1 \\ v_2 \\ \vdots \\ v_n \\ \end{array} \right] \cdot \left[ \begin{array}{c} w_1 \\ w_2 \\ \vdots \\ w_n \\ \end{array} \right] = v_1w_1 + v_2w_2 + \cdots + v_n w_n = \begin{bmatrix} v_1 \amp v_2 \amp \cdots \amp v_n \\ \end{bmatrix} \begin{bmatrix} w_1 \\ w_2 \\ \cdots \\ w_n \\ \end{bmatrix}\text{.} \end{equation*}
So a dot product can be comuted as a matrix product of a \(1 \by n\) matrix and an \(n \by 1\) matrix.
This means that for a fixed dimension, the dot product is a linear transformation from \(\real^n \to \real^1\text{.}\) There is a 3Blue1Brown video 1  that illustrates the connection between the dot product and this linear transformation. We recommend that you view it.
We will write
\begin{equation*} \begin{bmatrix} v_1 \amp v_2 \amp \cdots \amp v_n \\ \end{bmatrix} = \begin{bmatrix} v_1 \\ v_2 \\ \cdots \\ v_n \\ \end{bmatrix}^\transpose \end{equation*}
and call the wide version the transpose of the tall version. For more about matrix transposes see Section 6.2.

Example 6.1.13.

For two-dimensional vectors \(\xvec\) and \(\yvec\text{,}\) their dot product \(\xvec\cdot\yvec\) is
\begin{equation*} \xvec\cdot\yvec = \twovec{x_1}{x_2}\cdot\twovec{y_1}{y_2} = x_1y_1 + x_2y_2\text{.} \end{equation*}
For instance,
\begin{equation*} \twovec{2}{-3}\cdot\twovec{4}{1} = 2\cdot 4 + (-3)\cdot 1 = 5\text{.} \end{equation*}

Example 6.1.14.

We compute the dot product between two four-dimensional vectors as
\begin{equation*} \left[ \begin{array}{c} 2 \\ 0 \\ -3 \\ 1 \\ \end{array} \right] \cdot \left[ \begin{array}{c} -1 \\ 3 \\ 1 \\ 2 \\ \end{array} \right] = 2(-1) + 0(3) + (-3)(1) + 1(2) = -3\text{.} \end{equation*}

Example 6.1.15.

From the Proposition 6.1.7 we know that \(\len{\vvec}^2 = \vvec \cdot \vvec\text{.}\) We can also compute the length of a vector using the Pythagorean Theorem. When using our computationa shortcut, we see that these are the same calculation.
For example, consider the vector \(\vvec = \twovec32\) as shown in Figure 6.1.16.
Figure 6.1.16. The vector \(\vvec=\twovec32\text{.}\)
We may find the length of this vector using the Pythagorean theorem since the vector forms the hypotenuse of a right triangle having a horizontal leg of length 3 and a vertical leg of length 2, so \(\len{\vvec}^2 = 3^2 + 2^2 = 13\text{.}\) Now notice that the dot product of \(\vvec\) with itself performs the identical arithmetic:
\begin{equation*} \vvec\cdot\vvec = 3(3) + 2(2) = 13 = \len{\vvec}^2\text{.} \end{equation*}

Activity 6.1.1.

  1. Compute the dot product
    \begin{equation*} \twovec{3}{4}\cdot\twovec{2}{-2}\text{.} \end{equation*}
  2. Sketch the vector \(\vvec=\twovec{3}{4}\) below. Then use the Pythagorean theorem to find the length of \(\vvec\text{.}\)
    Figure 6.1.17. Sketch the vector \(\vvec\) and find its length.
  3. Compute the dot product \(\vvec\cdot\vvec\text{.}\) How is the dot product related to the length of \(\vvec\text{?}\)
  4. Remember that the matrix \(\mattwo0{-1}10\) represents the matrix transformation that rotates vectors counterclockwise by \(90^\circ\text{.}\) Beginning with the vector \(\vvec = \twovec34\text{,}\) find \(\wvec\text{,}\) the result of rotating \(\vvec\) by \(90^\circ\text{,}\) and sketch it above.
  5. What is the dot product \(\vvec\cdot\wvec\text{?}\)
  6. Suppose that \(\vvec=\twovec ab\text{.}\) Find the vector \(\wvec\) that results from rotating \(\vvec\) by \(90^\circ\) and find the dot product \(\vvec\cdot\wvec\text{.}\)
Solution.
  1. \(\twovec34\cdot\twovec2{-2} = 3\cdot2+4\cdot(-2) = -2\text{.}\)
  2. The length of \(\vvec\) is 5.
  3. \(\vvec\cdot\vvec = 25\text{,}\) which is the square of the length of \(\vvec\text{.}\)
  4. \(\displaystyle \wvec=\twovec{-4}3\)
  5. \(\vvec\cdot\wvec=0\text{.}\)
  6. \(\vvec\cdot\wvec=0\text{.}\)

Activity 6.1.2.

  1. Sketch the vectors \(\vvec=\twovec32\) and \(\wvec=\twovec{-1}3\) using Figure 6.1.18.
    Figure 6.1.18. Sketch the vectors \(\vvec\) and \(\wvec\) here.
  2. Find the lengths \(\len{\vvec}\) and \(\len{\wvec}\) using the dot product.
  3. Find the dot product \(\vvec\cdot\wvec\) and use it to find the angle between \(\vvec\) and \(\wvec\text{.}\)
  4. Consider the vector \(\xvec = \twovec{-2}{3}\text{.}\) Include it in your sketch in Figure 6.1.18 and find the angle between \(\vvec\) and \(\xvec\text{.}\)
  5. If two vectors are perpendicular, what can you say about their dot product? Explain your thinking.
  6. For what value of \(k\) is the vector \(\twovec6k\) perpendicular to \(\wvec\text{?}\)
  7. Python can be used to find lengths of vectors and their dot products. For instance, if v and w are vectors, then np.linalg.norm() gives the length of v and v @ w gives \(\vvec\cdot\wvec\text{.}\)
    Suppose that
    \begin{equation*} \vvec=\fourvec203{-2}, \hspace{24pt} \wvec=\fourvec1{-3}41\text{.} \end{equation*}
    Use the Python cell below to find \(\len{\vvec}\text{,}\) \(\len{\wvec}\text{,}\) \(\vvec\cdot\wvec\text{,}\) and the angle between \(\vvec\) and \(\wvec\text{.}\) You may use math.acos() to find the angle’s measure expressed in radians.
Answer.
  1. \(\len{\vvec} = \sqrt{13}\) and \(\len{\wvec} = \sqrt{10}\text{.}\)
  2. \(\displaystyle \theta = = 52.1^\circ.\)
  3. \(\displaystyle \theta = = 90^\circ.\)
  4. Their dot product must be zero.
  5. \(k=2\text{.}\)
  6. \(55.9^\circ\text{.}\)
Solution.
  1. We find that
    \begin{gather*} \vvec\cdot\vvec {}={} 13\\ \wvec\cdot\wvec {}={} 10 \end{gather*}
    so that \(\len{\vvec} = \sqrt{13}\) and \(\len{\wvec} = \sqrt{10}\text{.}\)
  2. \(\vvec\cdot\wvec = 3\) so that
    \begin{equation*} \theta = \arccos\left(\frac{\vvec\cdot\wvec}{\len{\vvec}\len{\wvec}}\right) = 52.1^\circ. \end{equation*}
  3. \(\vvec\cdot\xvec = 0\) so that
    \begin{equation*} \theta = \arccos\left(\frac{\vvec\cdot\xvec}{\len{\vvec}\len{\xvec}}\right) = 90^\circ. \end{equation*}
  4. If two vectors are perpendicular, then the angle between them is \(90^\circ\text{.}\) Since \(\cos(90^\circ) = 0\text{,}\) their dot product must be zero.
  5. The dot product is \(-6 + 3k = 0\) so \(k=2\text{.}\)
  6. We find that \(\len{\vvec} = \sqrt{17}\text{,}\) \(\len{\wvec} = 3\sqrt{3}\text{,}\) \(\vvec\cdot\wvec = 12\text{,}\) and the angle between these vectors is \(55.9^\circ\text{.}\)
As we have seen, we can use the dot product to compute the angle between two vectors. This angle (or the cosine of the angle) is often taken as a measure of "similarity" between two vectors. For example, consider the vectors \(\uvec\text{,}\) \(\vvec\text{,}\) and \(\wvec\text{,}\) shown in Figure 6.1.19. The vectors \(\vvec\) and \(\wvec\) seem somewhat similar as the directions they define are nearly the same. By comparison, \(\uvec\) appears rather dissimilar to both \(\vvec\) and \(\wvec\text{.}\) We will measure the similarity of vectors by finding the angle between them; the smaller the angle, the more similar the vectors. This is especially useful in contexts where the direction of a vector conveys more important information than its magnitude.
Figure 6.1.19. Which of the vectors are most similar?

Activity 6.1.3.

This activity explores two uses of the dot product as a way to compute the "similarity" of vectors.
  1. Our first task is to assess the similarity between various Wikipedia articles by forming vectors from each of five articles. In particular, one may download the text from a Wikipedia article, remove common words, such as “the” and “and”, count the number of times the remaining words appear in the article, and represent these counts in a vector, called the document vector for each article.
    For example, evaluate the following cell that loads a matrix constructed from the Wikipedia articles on Veteran’s Day, Memorial Day, Labor Day, the Golden Globe Awards, and the Super Bowl. Each row of the matrix represents one of 604 words and each column represents one of the articles as a document vector. For instance, the word “act” appears 3 times in the Veteran’s Day article and 0 times in the Labor Day article.
    1. Suppose that two articles have no words in common. What is the value of the dot product between their corresponding vectors? What does this say about the angle between these vectors?
    2. Suppose there are two articles on the same subject, yet one article is twice as long. What approximate relationship would you expect to hold between the two vectors? What does this say about the angle between them?
    3. Use the Python cell below to find the angle between the document vector for the Veteran’s Day article and the other four articles. To express the angle in degrees, multiply radians by 180.0 / math.pi.
    4. Compare the four angles you have found and discuss what they mean about the similarity between the Veteran’s Day article and the other four. How do your findings reflect the nature of these five events?
  2. Vectors are often used to represent how a quantity changes over time. For instance, the vector \(\svec=\fourvec{78.3}{81.2}{82.1}{79.0}\) might represent the value of a company’s stock on four consecutive days. When interpreted in this way, we call the vector a time series. Evaluate the Python cell below to see a representation of time series for four different stocks over 10 days.
    Notice that although one stock has a higher value, stocks 0 and 3 appear to be related since they seem to rise and fall at roughly similar ways. We often say that such series are correlated, and we would like to measure the degree to which they are correlated.
    1. In order to compare the ways in which they rise and fall, we will first demean each time series; that is, for each time series, we will subtract its average value to obtain a new time series.
    2. If the demeaned series are \(\tilde{\svec}_1\) and \(\tilde{\svec}_2\text{,}\) then the correlation between \(\svec_1\) and \(\svec_2\) is defined to be
      \begin{equation*} \corr(\svec_1, \svec_2) = \frac{\tilde{\svec}_1\cdot\tilde{\svec}_2} {\len{\tilde{\svec}_1}\len{\tilde{\svec}_2}} = \frac{\len{\tilde{\svec}_1}\len{\tilde{\svec}_2} \cos \theta} {\len{\tilde{\svec}_1}\len{\tilde{\svec}_2}} = \cos \theta \end{equation*}
      where \(\theta\) is the angle between \(\tilde{\svec_1}\) and \(\tilde{\svec_2}\text{.}\) That is, the correlation equals the cosine of the angle between the demeaned time series. Among other things, this implies that \(\corr(\svec_1,\svec_2)\) is always between -1 and 1.
      Find the correlation between each pair of stocks.
    3. Suppose that two time series are such that their demeaned time series are scalar multiples of one another, as in Figure 6.1.20
      Figure 6.1.20. On the left, the demeaned time series are positive scalar multiples of one another. On the right, they are negative scalar multiples.
      For instance, suppose we have time series \(\tvec_1\) and \(\tvec_2\) whose demeaned time series \(\tilde{\tvec}_1\) and \(\tilde{\tvec}_2\) are positive scalar multiples of one another. What is the angle between the demeaned vectors? What does this say about the correlation \(\corr(\tvec_1, \tvec_2)\text{?}\)
    4. Suppose the demeaned time series \(\tilde{\tvec}_1\) and \(\tilde{\tvec}_2\) are negative scalar multiples of one another, what is the angle between the demeaned vectors? What does this say about the correlation \(\corr(\tvec_1, \tvec_2)\text{?}\)
    5. Which pair of stocks had the largest correlation? How is this reflected in the plot?
    6. Which pair of stocks had the smallest (most negative) correlation? How is this reflected in the plot?
    7. Which pair of stocks had a correlation closest to 0? How is this reflected in the plot?
    8. The correlation is important enough that numpy can do all the work of computing correlations for each pair of rows (default) or columns (what we want here) in a matrix. The result is a matrix containing all the pairwise correlations.
      • Explain why the shape of the correlation matrix is what it is. What shape would it have been if we had not used rowvar = False?
      • Explain why this matrix is symmetric.
      • Explain why the diagonal values are what they are.
Answer.
    1. They are perpendicular.
    2. The angle should be close to 0.
    3. The angle veterans makes with memorial is \(46.4^\circ\text{,}\) with labor is \(59.2^\circ\text{,}\) with golden is \(86.2^\circ\text{,}\) and with super is \(86.4^\circ\text{.}\)
    4. The articles on Veteran’s Day and Memorial Day are most similar.
    1. The graphs are now lowered so that their averages are zero.
Solution.
    1. If there are no words in common, then the dot product between the two vectors will be zero. This means that they are perpendicular to one another.
    2. The vectors should be, at least approximately, scalar multiples of one another, which means that the angle between them is zero.
    3. The angle veterans makes with memorial is \(46.4^\circ\text{,}\) with labor is \(59.2^\circ\text{,}\) with golden is \(86.2^\circ\text{,}\) and with super is \(86.4^\circ\text{.}\)
    4. It appears that the articles on Veteran’s Day and Memorial Day are most similar. This makes sense because both are U.S. national holidays that honor military service. The second most similar article is Labor Day, which is also a national holiday. The other two are quite dissimilar as they are entertainment events.
    1. The graphs are now lowered so that their averages are zero.
    2. See below for all of the correlations.
    3. The angle should be zero, which means that the correlation will be \(1\text{.}\)
    4. The angle should be \(180^\circ\text{,}\) which means that the correlation should be \(-1\text{.}\)
    5. The largest correlation is \(0.97\) for stocks 0 and 3, which are most similar.
    6. The smallest correlation is \(-0.91\) for stocks 1 and 3, which are least similar. When one goes up, the other tends to go down.
    7. The closest correlation to 0 is \(-0.090\) for stocks 3 and 4. These stocks seem unrelated. When one goes up, the other sometimes goes up, sometimes down.
    8. Correlation matrices are square and have a row and column for each of the vectors we computed correlations with. Since we have 4 column vectors here, we get a \(4 \by 4\) matrix. It would be \(10 \by 10\) if we did row-wise correlations. These matrices are always symmetric because the definition of correlation is symmetric. The diagonal entries are 1 because
      \begin{equation*} \corr(\xvec, \xvec) = \frac{\xvec \cdot \xvec}{\vlen{\xvec}\vlen{\xvec}} = \frac{\xvec \cdot \xvec}{\len{\xvec}^2} = 1\text{.} \end{equation*}
We conclude this section by listing once more the important properties of the dot produt.

Properties of the dot product.

Let \(\xvec\text{,}\) \(\yvec\text{,}\) and \(\wvec \) be vectors in \(\real^n\text{,}\) let \(\theta\) be the angle between \(\xvec\) and \(\yvec\text{,}\) and let \(a \) and \(b\) be scalars. Then
  1. \(\xvec\cdot\yvec = \len{\xvec}\len{\yvec}\cos\theta\text{.}\)
  2. \(\xvec\cdot\xvec = \len{\xvec}^2 \text{.}\)
  3. \(\proj{\yvec}{\xvec} = \frac{\yvec \cdot \xvec}{\len{\xvec}^2} \xvec \text{.}\)
  4. \(\xvec\cdot\yvec = \yvec\cdot\xvec \text{.}\)
  5. \((a\xvec)\cdot\yvec = a(\xvec\cdot\yvec)\text{.}\)
  6. \((a\xvec+b\yvec)\cdot\wvec = a\xvec\cdot\wvec + b\yvec\cdot\wvec\text{.}\)
  7. \(\xvec \cdot \yvec = \sum_{i=1}^n x_i y_i\text{.}\)

Subsection 6.1.3 \(k\)-means clustering

A typical problem in data science is to find some underlying patterns in a dataset. Suppose, for instance, that we have the set of 177 data points plotted in Figure 6.1.21. Notice that the points are not scattered around haphazardly; instead, they seem to form clusters. Our goal here is to develop a strategy for detecting the clusters.
Figure 6.1.21. A set of 177 data points.
To see how this could be useful, suppose we have medical data describing a group of patients, some of whom have been diagnosed with a specific condition, such as diabetes. Perhaps we have a record of age, weight, blood sugar, cholesterol, and other attributes for each patient. It could be that the data points for the group diagnosed as having the condition form a cluster that is somewhat distinct from the rest of the data. Suppose that we are able to identify that cluster and that we are then presented with a new patient that has not been tested for the condition. If the attributes for that patient place them in that cluster, we might identify them as being at risk for the condition and prioritize them for appropriate screenings.
If there are many attributes for each patient, the data may be high-dimensional and not easily visualized. We would therefore like to develop an algorithm that separates the data points into clusters without human intervention. We call the result a clustering.
The next activity introduces a technique, called \(k\)-means clustering, that helps us find clusterings. To do so, we will view the data points as vectors so that the distance between two data points equals the length of the vector joining them. That is, if two points are represented by the vectors \(\vvec\) and \(\wvec\text{,}\) then the distance between the points is \(\len{\vvec-\wvec}\text{.}\)

Activity 6.1.4.

To begin, we identify the centroid, or the average, of a set of vectors \(\vvec_1, \vvec_2, \cdots,\vvec_n\) as
\begin{equation*} \frac1n\left(\vvec_1+\vvec_2+\cdots+\vvec_n\right)\text{.} \end{equation*}
  1. Find the centroid of the vectors
    \begin{equation*} \vvec_1=\twovec11, \vvec_2=\twovec41, \vvec_3=\twovec44. \end{equation*}
    and sketch the vectors and the centroid using Figure 6.1.22. You may wish to simply plot the points represented by the tips of the vectors rather than drawing the vectors themselves.
    Figure 6.1.22. The vectors \(\vvec_1\text{,}\) \(\vvec_2\text{,}\) \(\vvec_3\) and their centroid.
    Notice that the centroid lies in the center of the points defined by the vectors.
  2. Now we’ll illustrate an algorithm that forms clusterings. To begin, consider the following points, represented as vectors,
    \begin{equation*} \vvec_1=\twovec{-2}{1}, \vvec_2=\twovec11, \vvec_3=\twovec12, \vvec_4=\twovec32, \end{equation*}
    which are shown in Figure 6.1.23.
    Figure 6.1.23. We will group this set of four points into two clusters.
    Suppose that we would like to group these points into \(k=2\) clusters. (Later on, we’ll see how to choose an appropriate value for \(k\text{,}\) the number of clusters.) We begin by choosing two points \(c_1\) and \(c_2\) at random and declaring them to be the “centers”’ of the two clusters.
    For example, suppose we randomly choose \(c_1=\vvec_2\) and \(c_2=\vvec_3\) as the center of two clusters. The cluster centered on \(c_1=\vvec_2\) will be the set of points that are closer to \(c_1=\vvec_2\) than to \(c_2=\vvec_3\text{.}\) Determine which of the four data points are in this cluster, which we denote by \(C_1\text{,}\) and circle them in Figure 6.1.23.
  3. The second cluster will consist of the data points that are closer to \(c_2=\vvec_3\) than \(c_1=\vvec_2\text{.}\) Determine which of the four points are in this cluster, which we denote by \(C_2\text{,}\) and circle them in Figure 6.1.23.
  4. We now have a clustering with two clusters, but we will try to improve upon it in the following way. First, find the centroids of the two clusters; that is, redefine \(c_1\) to be the centroid of cluster \(C_1\) and \(c_2\) to be the centroid of \(C_2\text{.}\) Find those centroids and indicate them in Figure 6.1.24
    Figure 6.1.24. Indicate the new centroids and clusters.
    Now update the cluster \(C_1\) to be the set of points closer to \(c_1\) than \(c_2\text{.}\) Update the cluster \(C_2\) in a similar way and indicate the clusters in Figure 6.1.24.
  5. Let’s perform this last step again. That is, update the centroids \(c_1\) and \(c_2\) from the new clusters and then update the clusters \(C_1\) and \(C_2\text{.}\) Indicate your centroids and clusters in Figure 6.1.25.
    Figure 6.1.25. Indicate the new centroids and clusters.
    Notice that this last step produces the same set of clusters so there is no point in repeating it. We declare this to be our final clustering.
Answer.
  1. The centroid is \(c = \twovec32\text{.}\)
  2. \(C_1 = \{\vvec_1, \vvec_2\}\text{.}\)
  3. \(C_2=\{\vvec_3,\vvec_4\}\text{.}\)
  4. \(c_1 = \twovec{-1/2}{0}\) and \(c_2 = \twovec22\)
    \(C_1 = \{\vvec_1\}\) and \(C_2=\{\vvec_2, \vvec_3, \vvec_4\}\text{.}\)
  5. \(c_1 = \twovec{-2}1\) and \(c_2 = \twovec{5/3}{5/3}\text{.}\) The clusters \(C_1\) and \(C_2\) are unchanged.
Solution.
  1. The centroid is \(c = \twovec32\text{.}\)
  2. The first cluster is \(C_1 = \{\vvec_1, \vvec_2\}\text{.}\)
  3. The second cluster is \(C_2=\{\vvec_3,\vvec_4\}\text{.}\)
  4. We redefine \(c_1 = \twovec{-1/2}{0}\) and \(c_2 = \twovec22\text{.}\) This leads to new clusters \(C_1 = \{\vvec_1\}\) and \(C_2=\{\vvec_2, \vvec_3, \vvec_4\}\text{.}\)
  5. We have new centroids \(c_1 = \twovec{-2}1\) and \(c_2 = \twovec{5/3}{5/3}\text{,}\) and the clusters \(C_1\) and \(C_2\) are unchanged.
This activity demonstrates our algorithm for finding a clustering. We first choose a value \(k\) and seek to break the data points into \(k\) clusters. The algorithm proceeds in the following way:
  • Choose \(k\) points \(c_1, c_2, \ldots, c_k\) at random from our data set.
  • Construct the cluster \(C_1\) as the set of data points closest to \(c_1\text{,}\) \(C_2\) as the set of data points closest to \(c_2\text{,}\) and so forth.
  • Repeat the following until the clusters no longer change:
    • Find the centroids \(c_1, c_2,\ldots,c_k\) of the current clusters.
    • Update the clusters \(C_1,C_2,\ldots,C_k\text{.}\)
The clusterings we find depend on the initial random choice of points \(c_1, c_2,\ldots, c_k\text{.}\) For instance, in the previous activity, we arrived, with the initial choice \(c_1= \vvec_2\) and \(c_2=\vvec_3\text{,}\) at the clustering:
\begin{equation*} \begin{array}{rcl} C_1 \amp {}={} \amp \{\vvec_1\} \\ C_2 \amp {}={} \amp \{\vvec_2, \vvec_3,\vvec_4\} \end{array}\text{.} \end{equation*}
If we instead choose the initial points to be \(c_1 = \vvec_3\) and \(c_2=\vvec_4\text{,}\) we eventually find the clustering:
\begin{equation*} \begin{array}{rcl} C_1 \amp {}={} \amp \{\vvec_1, \vvec_2, \vvec_3\} \\ C_2 \amp {}={} \amp \{\vvec_4\} \end{array}\text{.} \end{equation*}
Is there a way that we can determine which clustering is the better of the two? It seems like a better clustering will be one for which the points in a cluster are, on average, closer to the centroid of their cluster. If we have a clustering, we therefore define a function, called the objective, which measures the average of the square of the distance from each point to the centroid of the cluster to which that point belongs. A clustering with a smaller objective will have clusters more tightly centered around their centroids, which should result in a better clustering.
For example, when we obtain the clustering:
\begin{equation*} \begin{array}{rcl} C_1 \amp {}={} \amp \{\vvec_1, \vvec_2, \vvec_3\} \\ C_2 \amp {}={} \amp \{\vvec_4\} \end{array}\text{.} \end{equation*}
with centroids \(c_1=\ctwovec{0}{4/3}\) and \(c_2=\vvec_4=\twovec32\text{,}\) we find the objective to be
\begin{equation*} \frac14\left( \len{\vvec_1-c_1}^2 + \len{\vvec_2-c_1}^2 + \len{\vvec_3-c_1}^2 + \len{\vvec_4-c_2}^2 \right) = \frac 53\text{.} \end{equation*}

Activity 6.1.5.

We’ll now use the objective to compare clusterings and to choose an appropriate value of \(k\text{.}\)
  1. In the previous activity, one initial choice of \(c_1\) and \(c_2\) led to the clustering:
    \begin{equation*} \begin{array}{rcl} C_1 \amp {}={} \amp \{\vvec_1\} \\ C_2 \amp {}={} \amp \{\vvec_2, \vvec_3,\vvec_4\} \end{array} \end{equation*}
    with centroids \(c_1=\vvec_1\) and \(c_2=\twovec{5/3}{5/3}\text{.}\) Find the objective of this clustering.
  2. We have now seen two clusterings and computed their objectives. Recall that our data set is shown in Figure 6.1.23. Which of the two clusterings feels like the better fit? How is this fit reflected in the values of the objectives?
  3. Evaluating the following cell will load and display a data set consisting of 177 data points. This data set has the name data.
    Given this plot of the data, what would seem like a reasonable number of clusters?
  4. In the following cell, you may choose a value of \(k\) and then run the algorithm to determine and display a clustering and its objective. If you run the algorithm a few times with the same value of \(k\text{,}\) you will likely see different clusterings having different objectives. This is natural since our algorithm starts by making a random choice of points \(c_1,c_2,\ldots,c_k\text{,}\) and a different choices may lead to different clusterings. Choose a value of \(k\) and run the algorithm a few times. Notice that clusterings having lower objectives seem to fit the data better. Repeat this experiment with a few different values of \(k\text{.}\)
  5. For a given value of \(k\text{,}\) our strategy is to run the algorithm several times and choose the clustering with the smallest objective. After choosing a value of \(k\text{,}\) the following cell will run the algorithm 10 times and display the clustering having the smallest objective.
    For each value of \(k\) between 2 and 9, find the clustering having the smallest objective and plot your findings in Figure 6.1.26.
    Figure 6.1.26. Construct a plot of the minimal objective as it depends on the choice of \(k\text{.}\)
    This plot is called an elbow plot due to its shape. Notice how the objective decreases sharply when \(k\) is small and then flattens out. This leads to a location, called the elbow, where the objective transitions from being sharply decreasing to relatively flat. This means that increasing \(k\) beyond the elbow does not significantly decrease the objective, which makes the elbow a good choice for \(k\text{.}\)
    Where does the elbow occur in your plot above? How does this compare to the best value of \(k\) that you estimated by simply looking at the data in Item c.
Of course, we could increase \(k\) until each data point is its own cluster. However, this defeats the point of the technique, which is to group together nearby data points in the hope that they share common features, thus providing insight into the structure of the data.
Answer.
  1. The objective is \(5/6\text{.}\)
  2. The clustering \(C_1=\{\vvec_1\}\) and \(C_2 = \{\vvec_2, \vvec_3, \vvec_4\}\) has a smaller objective.
  3. \(k=6\) or \(k=7\text{.}\)
  4. With a fixed value of \(k\text{,}\) running the algorithm several times leads to different clusterings with different objectives. If we increase \(k\text{,}\) the objective generally decreases.
  5. The elbow occurs around \(k=6\) or \(k=7\text{.}\)
Solution.
  1. The objective is
    \begin{equation*} \frac14\left(\len{(\vvec_1 - c_1)}^2 + \len{(\vvec_2 - c_2)}^2 + \len{(\vvec_3 - c_2)}^2 + \len{(\vvec_4 - c_2)}^2\right) = \frac56\text{.} \end{equation*}
  2. The clustering with \(C_1=\{\vvec_1\}\) and \(C_2 = \{\vvec_2, \vvec_3, \vvec_4\}\) appears to be a tighter clustering and has a smaller objective.
  3. It appears that the best clustering is either \(k=6\) or \(k=7\text{.}\)
  4. With a fixed value of \(k\text{,}\) running the algorithm several times leads to different clusterings with different objectives. If we increase \(k\text{,}\) the objective generally decreases.
  5. The elbow occurs around \(k=6\) or \(k=7\text{,}\) which are the values that we felt led to the best clusterings.
We have now seen how our algorithm and the objective identify a reasonable value for \(k\text{,}\) the number of the clusters, and produce a good clustering having \(k\) clusters. Notice that we don’t claim to have found the best clustering as the true test of any clustering will be in how it helps us understand the dataset and helps us make predictions about any new data that we may encounter.

Subsection 6.1.4 Summary

This section introduced the dot product and the ability to investigate geometric relationships between vectors.
  • The dot product of two vectors \(\vvec\) and \(\wvec\) satisfies these properties:
    \begin{equation*} \begin{array}{rcl} \vvec\cdot\vvec \amp {}={} \amp \len{\vvec}^2 \\ \vvec\cdot\wvec \amp {}={} \amp \len{\vvec}\len{\wvec}\cos\theta \\ \end{array} \end{equation*}
    where \(\theta\) is the angle between \(\vvec\) and \(\wvec\text{.}\)
  • The vectors \(\vvec\) and \(\wvec\) are orthogonal when \(\vvec\cdot\wvec= 0\text{.}\)
  • We explored some applications of the dot product to the similarity of vectors, correlation of time series, and \(k\)-means clustering.

Exercises 6.1.5 Exercises

1.

Consider the vectors
\begin{equation*} \vvec=\fourvec203{-2}, \hspace{24pt} \wvec=\fourvec1{-3}41. \end{equation*}
  1. Find the lengths of the vectors, \(\len{\vvec}\) and \(\len{\wvec}\text{.}\)
  2. Find the dot product \(\vvec\cdot\wvec\) and use it to find the angle \(\theta\) between \(\vvec\) and \(\wvec\text{.}\)

2.

Consider the three vectors
\begin{equation*} \uvec = \threevec1{-2}2, \hspace{24pt} \vvec = \threevec111, \hspace{24pt} \wvec = \threevec02{-1}. \end{equation*}
  1. Find the dot products \(\uvec\cdot\uvec\text{,}\) \(\uvec\cdot\vvec\text{,}\) and \(\uvec\cdot\wvec\text{.}\)
  2. Use the dot products you just found to evaluate:
    1. \(\len{\uvec}\text{.}\)
    2. \((-5\uvec)\cdot\vvec\text{.}\)
    3. \(\uvec\cdot(-3\vvec+2\wvec)\text{.}\)
    4. \(\len{\frac1{\len{\uvec}} \uvec}\text{.}\)
  3. For what value of \(k\) is \(\uvec\) orthogonal to \(k\vvec+5\wvec\text{?}\)

3.

Suppose that \(\vvec\) and \(\wvec\) are vectors where
\begin{equation*} \vvec\cdot\vvec = 4, \hspace{24pt} \wvec\cdot\wvec = 20,\hspace{24pt} \vvec\cdot\wvec=8. \end{equation*}
  1. What is \(\len{\vvec}\text{?}\)
  2. What is the angle between \(\vvec\) and \(\wvec\text{?}\)
  3. Suppose that \(t\) is a scalar. Find the value of \(t\) for which \(\vvec\) is orthogonal to \(\wvec+t\vvec\text{?}\)

4.

Suppose that \(\vvec=3\wvec\text{.}\)
  1. What is the relationship between \(\vvec\cdot\vvec\) and \(\wvec\cdot\wvec\text{?}\)
  2. What is the relationship between \(\len{\vvec}\) and \(\len{\wvec}\text{?}\)
  3. If \(\vvec=s\wvec\) for some scalar \(s\text{,}\) what is the relationship between \(\vvec\cdot\vvec\) and \(\wvec\cdot\wvec\text{?}\) What is the relationship between \(\len{\vvec}\) and \(\len{\wvec}\text{?}\)
  4. Suppose that \(\vvec=\threevec{3}{-2}2\text{.}\) Find a scalar \(s\) so that \(s\vvec\) has length 1.

5.

Given vectors \(\vvec\) and \(\wvec\text{,}\) explain why
\begin{equation*} \len{\vvec+\wvec}^2 + \len{\vvec-\wvec}^2 = 2\len{\vvec}^2 + 2\len{\wvec}^2. \end{equation*}
Sketch two vectors \(\vvec\) and \(\wvec\) and explain why this fact is called the parallelogram law.

6.

Consider the vectors
\begin{equation*} \vvec_1=\threevec204,\hspace{24pt} \vvec_2=\threevec{-1}2{-4}. \end{equation*}
and a general vector \(\xvec=\threevec xyz\text{.}\)
  1. Write an equation in terms of \(x\text{,}\) \(y\text{,}\) and \(z\) that describes all the vectors \(\xvec\) orthogonal to \(\vvec_1\text{.}\)
  2. Write a linear system that describes all the vectors \(\xvec\) orthogonal to both \(\vvec_1\) and \(\vvec_2\text{.}\)
  3. Write the solution set to this linear system in parametric form. What type of geometric object does this solution set represent? Indicate with a rough sketch why this makes sense.
  4. Give a parametric description of all vectors orthogonal to \(\vvec_1\text{.}\) What type of geometric object does this represent? Indicate with a rough sketch why this makes sense.

7.

Explain your responses to these questions.
  1. Suppose that \(\vvec\) is orthogonal to both \(\wvec_1\) and \(\wvec_2\text{.}\) Can you guarantee that \(\vvec\) is also orthogonal to any linear combination \(c_1\wvec_1+c_2\wvec_2\text{?}\)
  2. Suppose that \(\vvec\) is orthogonal to itself. What can you say about \(\vvec\text{?}\)

8.

Suppose that \(\vvec_1\text{,}\) \(\vvec_2\text{,}\) and \(\vvec_3\) form a basis for \(\real^3\) and that each vector is orthogonal to the other two. Suppose also that \(\vvec\) is another vector in \(\real^3\text{.}\)
  1. Explain why \(\vvec=c_1\vvec_1+c_2\vvec_2+c_3\vvec_3\) for some scalars \(c_1\text{,}\) \(c_2\text{,}\) and \(c_3\text{.}\)
  2. Beginning with the expression
    \begin{equation*} \vvec\cdot\vvec_1 = (c_1\vvec_1+c_2\vvec_2+c_3\vvec_3)\cdot\vvec_1, \end{equation*}
    apply the distributive property of dot products to explain why
    \begin{equation*} c_1=\frac{\vvec\cdot\vvec_1}{\vvec_1\cdot\vvec_1}. \end{equation*}
    Find similar expressions for \(c_2\) and \(c_3\text{.}\)
  3. Verify that
    \begin{equation*} \vvec_1=\threevec121,\hspace{24pt} \vvec_2=\threevec1{-1}1,\hspace{24pt} \vvec_3=\threevec10{-1} \end{equation*}
    form a basis for \(\real^3\) and that each vector is orthogonal to the other two. Use what you’ve discovered in this problem to write the vector \(\vvec=\threevec35{-1}\) as a linear combination of \(\vvec_1\text{,}\) \(\vvec_2\text{,}\) and \(\vvec_3\text{.}\)

9.

Suppose that \(\vvec_1\text{,}\) \(\vvec_2\text{,}\) and \(\vvec_3\) are three nonzero vectors that are pairwise orthogonal; that is, each vector is orthogonal to the other two.
  1. Explain why \(\vvec_3\) cannot be a linear combination of \(\vvec_1\) and \(\vvec_2\text{.}\)
  2. Explain why this set of three vectors is linearly independent.

10.

Suppose \(\vvec_1, \vvec_2, \dots \vvec_k\) are pairwise orthogonal non-zero vectors in \(\real^n\text{.}\) Show that these vectors must be linearly independent.

11.

In the next chapter, we will consider certain \(n\by n\) matrices \(A\) and define a function
\begin{equation*} q(\xvec) = \xvec\cdot(A\xvec), \end{equation*}
where \(\xvec\) is a vector in \(\real^n\text{.}\)
  1. Suppose that \(A=\begin{bmatrix} 1 \amp 2 \\ 2 \amp 1 \\ \end{bmatrix}\) and \(\xvec=\twovec21\text{.}\) Evaluate \(q(\xvec) = \xvec\cdot(A\xvec)\text{.}\)
  2. For a general vector \(\xvec=\twovec xy\text{,}\) evaluate \(q(\xvec) = \xvec\cdot(A\xvec)\) as an expression involving \(x\) and \(y\text{.}\)
  3. Suppose that \(\vvec\) is an eigenvector of a matrix \(A\) with associated eigenvalue \(\lambda\) and that \(\vvec\) has length 1. What is the value of the function \(q(\xvec)\text{?}\)

12.

Back in Section 2.1, we saw that equations of the form \(Ax+By = C\) represent lines in the plane. In this exercise, we will see how this expression arises geometrically.
Figure 6.1.27. A line, a point \(\pvec\) on the line, and a vector \(\nvec\) perpendicular to the line.
  1. Find the slope and vertical intercept of the line shown in Figure 6.1.27. Then write an equation for the line in the form \(y=mx+b\text{.}\)
  2. Suppose that \(\pvec\) is a point on the line, that \(\nvec\) is a vector perpendicular to the line, and that \(\xvec=\twovec xy\) is a general point on the line. Sketch the vector \(\xvec-\pvec\) and describe the angle between this vector and the vector \(\nvec\text{.}\)
  3. What is the value of the dot product \(\nvec\cdot(\xvec - \pvec)\text{?}\)
  4. Explain why the equation of the line can be written in the form \(\nvec\cdot\xvec = \nvec\cdot\pvec\text{.}\)
  5. Identify the vectors \(\pvec\) and \(\nvec\) for the line illustrated in Figure 6.1.27 and use them to write the equation of the line in terms of \(x\) and \(y\text{.}\) Verify that this expression is algebraically equivalent to the equation \(y=mx+b\) that you earlier found for this line.
  6. Explain why any line in the plane can be described by an equation having the form \(Ax+By = C\text{.}\) What is the significance of the vector \(\twovec AB\text{?}\)
youtu.be/LyGKycYT2v0?si=f9R2mjWaAYPllbcs