Section 1.5 Tensors
As we have seen, numpy
treats vectors and matrices as 1-dimensional and 2-dimensional arrays (both created using np.array()
). While mathematically 2-dimensional arrays (i.e, matrices) are the most important (and vectors can be treated like \(n \by 1\) matrices); computationally, there is no reason we need to stop at rows and columns. We could add additional dimensions. Higher dimensional arrays are often called tensors.
There are two primary reasons for taking some time to think about tensors here. First, there are some interesting applications that use tensors. One example of this is color images. Recall that color images store a matrix of values for each color channel. So we have 3 indices into this data: vertical position (row), horizontal position (column), and color channel. If we have many images, we could store them in a 4-dimensional array, using the additional index to indicate which photo is which.
The second reason to discuss tensors is that most of the numpy
code that works with vectors and matrices is designed to work with higher dimensional tensors as well. Understanding this can help demistify how some of the numpy
functions work.
Subsection 1.5.1 Tensors in NumPy
In order to use NumPy, we first load the numpy
package.
Let’s create a small 3-dimensional tensor. Notice that we can assign the shape (as long as the product of the items in our tuple is equal to the number values in the array).
Alternatively, we can use the reshape
method of an array.
The output above expresses A
as two \(4 \by 3\) matrices, and we can separate out those two matrices as A[0]
and A[1]
.
We can refer to individual numbers in an ndarray by providing the required number of indices.
The indexing is more flexible than demonstrated above. In each slot we can provide a slice of values. A slice has the format start:stop:step
. If omitted, start
is taken to be 0, stop
is taken to be the length of the array in that dimension, and step
is taken to be 1. To get all the values in some dimension, we can use :
.
An interesting question is what the shape of the resulting array should be, and we have some control over whether dimensions are dropped or not. Consider the following examples.
The first two methods drop a dimension. The latter two retain all three axes, even though one of them has only one legal index (0). It is always important to know the shape of the arrays you are working with. When in doubt, check.
Example 1.5.1.
An RGB image that is 400 pixels wide and 300 pixels tall might correspond to a tensor with shape (400, 300, 3)
. Or perhaps (3, 300, 400)
, or ... The computer doesn’t really care, but we need to know which slot is being used for what in order to work with this image correctly. The usual arrangement is row, column, channel. This is called the channels-last convention. But somtimes a channels-first convention is used instead.
We can plot images (or other matrices that we would like to view as an image) using a number of different tools. Here we demonstrate how to do so using matplotlib
. matplotlib
supports a variety of image formats. For RGB images, the RGB values can be integers from 0 to 255 or floating point values between 0 and 1.
Notice that dog
contains a 3-dimensional array of unsiged 8-bit integers (0 -- 255) and that the format is channels-last. We can plot individual channels in the same way.
We’ve added a color bar to this image to show how matplotlib is mapping the values 0 to 255 to colors. Remember, this is the red channel. So a colormap that runs from dark blue to bright yellow probably isn’t the best choice. We can use a scale of reds by setting cmap="Reds"
like this
If you use cmap = "Reds_r"
(give it a try), the order of the colors will be reversed and provide a better visual representation (for human eyes -- it’s the same data no matter what color scheme we use for display).
Subsection 1.5.2 Aggregation and Axes
We have already discussed axes in the context of slices that do or do not drop axes. Aggregation (computing a value based on a subset of values in an n-dimensional array) is another place where axes are important.
Example 1.5.3. Row and column sums.
We often want to know the sum or mean of all rows or columns of a matrix. NumPy provides two ways to compute aggregating functions like this.
Understanding the axis
argument can be a little counterintuitive. Notice that to get row sums, we specify axis = 1
and to get column sums, we specify axis = 0
. Here’s a way to think about this. Consider the shape of the result you want. If we are computing row sums, then our result should have two rows and one column, so \(2 \by 1\text{.}\) That is, we have collapsed axis 1: \(2 \by 3 \to 2 \by 1\text{.}\) The axis
argument specifies which axis (or axes) are being collapsed. And then, rather than retaining these axes, they are dropped (unless we specify keepdims = True
).
This way of doing things makes more sense in higher dimensions when there are more options for which axes to collapse.
Example 1.5.4.
NumPy supports several other aggregating functions as well. np.prod()
, np.mean()
, np.std()
, np.var()
, np.min()
, and np.max()
compute the product, mean, standard deviation variance, minimum, and maximum, collapsing over the specified axis or axes. And for boolean arrays, we can use np.all()
and np.any()
.
Replace mean()
with one of the other aggregating functions in the cell below to see how they work.
Subsection 1.5.3 Expanding an array
Sometimes it is useful to build up a matrix or other \(n\)-dimensional array bit by bit or to add an additional "layer" (think row or column) to an existing array. For example, to create an augmented matrix, we will often append an additional column matrix to an existing matrix, increasing the number of columns by one. np.append()
allows us to do this, provided our arrays (a) have the same number of dimensions and (b) compatible shapes along the desired axis.
Here’s an example where we add a column onto a \(3 \by 2 \) matrix.
Notice that np.append()
does not modify its arguments. It returns a new (larger) array.
NumPy provides several other functions that can be used to create larger arrays by combining smaller arrays.
np.concetentate()
takes a sequence of arrays and repeatedly appends. This is especially useful for combining three or more arrays.
np.stack()
(and specialized versions np.hstack()
and np.vstack()
concatenate along a new axis. This can be used to combine vectors into an array or arrays into a tensor. The resulting object will have a higher dimension (more axes) than the inputs.
-
For working with matrices and vectors, np.column_stack()
is handy since will treat 1-dimensional arrays as 2-dimensional column vectors, making it easy to add vectors as new columns of a matrix.
np.block()
creates a new array from a nested sequences of blocks.
np.insert()
allows insertion into a slice.
Example 1.5.5.
In scenes were some regions are very bright and others very dark, it can be challenging to create a good exposure over the entire image. Exposure bracketing is a technique that has been used since the 1850’s to help with this situation. This technique involves capturing multiple images, each with a different exposure. Later one can choose the best of the captured images or combine them in some way. In the 1850’s, Gustave Le Gray printed seascape photos from two images, using one exposure for the sky and the other for the water. Modern HDR methods can combine multiple digital images in much more complex ways.
In this example we stack multiple images with different exposures and use a very simple method (averaging) to convert the stack of images into a single HDR (high dynamic range) image. This works because each images loses some information in the brightest or darkest regions due to the limits on the values (typically 0 to 255) that each pixel may have. The averages will remain in the appropriate range, but do a better job of distinguishing among the very dark and very bright regions of the image.
The images were obtained from
WikiMedia 1 . We will use low resolutoin images below, but you can repeat this example with higher resolution images if you like. Notice the use of
np.stack()
to combine the images into one 4-dimensional tensor, and the use of
np.rint()
and
astype(int)
to make sure the resulting data is again integer (the nearest integer to the average).
More sophisticated algorithms can do an even better job of creating the final image from multiple exposures of the same scene.
Subsection 1.5.4 Broadcasting
According to the NumPy documentation,
The term broadcasting describes how NumPy treats arrays with different shapes during arithmetic operations. Subject to certain constraints, the smaller array is “broadcast” across the larger array so that they have compatible shapes. Broadcasting provides a means of vectorizing array operations so that looping occurs in C instead of Python. It does this without making needless copies of data and usually leads to efficient algorithm implementations. There are, however, cases where broadcasting is a bad idea because it leads to inefficient use of memory that slows computation.
For many operations on arrays, the shapes of the two arrays must be identical. Broadcasting allows us to relax that condition. Two arrays are said to be broadcastable to the same shape if when working from the tail (rightmost) end of the shape each dimension is compatible. Dimensions are compatible if
They are equal, or
One of them is 1, or
One of them is missing.
For example, suppose two arrays \(A\) and \(B\) have shapes \((3, 1, 10)\) and \((5, 10)\text{.}\) Working backwards, we first compare the two 10’s . They are equal, so they are compatible. Then we compare the 1 and the 5. They are not equal, but one of them is 1, so they are compatible. Finally, we compare 3 to a missing dimension. These are also compatbile.
Broadcasting will treat each array is if it had shape \((3, 5, 10)\text{.}\) To do that, each array must be "stretched". Broadcasting treats \(A\) as if it were \(A^*\) with \(A^*[i, j, k] == A[i, 1, k]\text{.}\) Similarly, \(B\) is treated as if it were \(B^*\) with \(B^*[i,j,k] = B[j, k]\text{.}\) That is, each array is treated as if data were copied to fill in the missing dimensions. Importantly for computational efficiency, this copying does not actually happen. But conceptually, we can imagine that copying if it helps us understand how the arrays are made compatible.
The simplest form of broadcasting is when a scalar is broadcast to make it compatible with an array. In this case it is as if an array of the same size were filled entirely with that single scalar value.
Example 1.5.6. Broadcasting.
For each of the ten pairs formed from seven
, A
, B
, C
, and D
, what result to you get when you add? See if you can figure out what the restults will be before executing the code to check.
Exercises 1.5.5 Exercises
1.
Using a color photo of your choosing, convert the image to "black and white" (actually gray scale) by averaging over the three color channels.
commons.wikimedia.org/wiki/File:StLouisArchMultExpEV%2B1.51.JPG