Subsection B.2 Packages and libraries for data science
Python packages can be loaded using the
import
command. We can load entire packages or load submodules or even individual objects. See below for some examples.There are several add-on pacakges that are useful for linear algebra. The ones used here are part of the SciPy "ecosystem of open-source software for mathematics, science, and engineering." Core packages in this suite include
-
NumPy (
import numpy as np
).NumPy is the fundamental package for scientific computing in Python. It is a Python library that provides a multidimensional array object, various derived objects (such as masked arrays and matrices), and an assortment of routines for fast operations on arrays, including mathematical, logical, shape manipulation, sorting, selecting, I/O, discrete Fourier transforms, basic linear algebra, basic statistical operations, random simulation and much more.
Note: NumPy includes both the 2-dimensional arrays (np.array()
) and matrices (np.matrix()
). Thenumpy.matrix
class makes certain things a bit more convenient, but we will follow SciPy advice:Despite its convenience, the use of the
numpy.matrix
is discouraged, since it adds nothing that cannot be accomplished with 2-D numpy.ndarray objects, and may lead to a confusion of which class is being used. -
SciPy Linear Algebra" (
from scipy import linalg
).SciPy is a large library with many packages and submodules. We will primarily usescipy.linalg
, which contains many linear algebra functions.Note thatnp.linalg
also exists. Here’s what SciPy has to say about that:"scipy.linalg contains all the functions in numpy.linalg plus some other more advanced ones not contained in numpy.linalg. Another advantage of using scipy.linalg over numpy.linalg is that it is always compiled with BLAS/LAPACK support, while for numpy this is optional. Therefore, the scipy version might be faster depending on how numpy was installed.Therefore, unless you don’t want to add scipy as a dependency to your numpy program, use scipy.linalg instead of numpy.linalg." -
Matplotlib (
import matplotlib.pyplot as plt
).Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python. Matplotlib makes easy things easy and hard things possible.
-
SymPy (
import sympy
).SymPy is a Python library for symbolic mathematics. It aims to become a full-featured computer algebra system (CAS) while keeping the code as simple as possible in order to be comprehensible and easily extensible. SymPy is written entirely in Python.
-
pandas (
import pandas as pd
).pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language.
The primary feature of pandas that we will use is its support for data frames. In a data frame, each row represents and observational unit and each column represents a variable. Different variables may be of different types and need not be numeric. pandas allows us to read data frames in from files of various formats (e.g., CSV files) and to perform data wrangling on the result.
seaborn.objects
and plotly
provide alternatives to matplotlib
for plotting.seaborn.objects
is "a completely new interface for making seaborn plots. It offers a more consistent and flexible API, comprising a collection of composable classes for transforming and plotting data. In contrast to the existing seaborn functions, the new interface aims to support end-to-end plot specification and customization without dropping down to matplotlib (although it will remain possible to do so if necessary)."- Unlike
seaborn.objects
, Plotly is built on top of a javascript graphics library rather than on top of Matplotlib. Plotly was designed with interactive graphics in mind from the start and provides interfaces for a number of languages, including python, R, and julia.