Course

Much of what you need to know to really dive into machine learning is linear algebra, and that is exactly what this tutorial tackles. Today’s post goes over the linear algebra topics that you need to know and understand to improve your intuition for how and when machine learning methods work by looking at the level of vectors and matrices.

By the end of the tutorial, you’ll hopefully feel more confident to take a closer look at an algorithm!

## Contents

## Introduction

One of the three basic topics that you need to master if you want to learn data science is machine learning. And when you’re really getting into machine learning, it’s important to dive deeper and to gather an intuition and understanding of how the algorithms work.

And I think that you, (aspiring) data scientist, will probably agree with me when I say that it can be a tough job to get to this point.

But it doesn’t need to be.

## SciPy versus NumPy

From DataCamp’s NumPy tutorial, you will have gathered that this library is one of the core libraries for scientific computing in Python. This library contains a collection of tools and techniques that can be used to solve on a computer mathematical models of problems in Science and Engineering. But the one tool that you’ll most often use is a high-performance multidimensional array object: it’s a powerful data structure that allows you to efficiently compute arrays and matrices.

Now, SciPy is basically NumPy.

It’s also one of the core packages for scientific computing that provides mathematical algorithms and convenience functions, but it’s built on the NumPy extension of Python. This means that SciPy and NumPy are often used together!

Later on in this tutorial, it will become clear to you how the collaboration between these two libraries has become self-evident.

## Interact with NumPy and SciPy

To interact efficiently with both packages, you first need to know some of the basics of this library and its powerful data structure. To work with these arrays, there’s a huge amount of high-level mathematical functions operate on these matrices and arrays.

In what follows, you’ll get a quick overview of the topics that you need to have mastered in order to work efficiently with SciPy. In essence, you have to know how about the array structure and how you can handle data types and how you can manipulate the shape of arrays. If this sounds alien to you, consider taking DataCamp’s Python NumPy Tutorial and while you’re learning, don’t forget to take a look at the NumPy cheat sheet.

If you already know all of this, skip through the following section and go to “Linear Algebra With SciPy”, but keep your SciPy cheat sheet for linear algebra ready!

### The Essentials of NumPy `ndarray`

Objects

An array is, structurally speaking, nothing but pointers. It’s a combination of a memory address, a data type, a shape and strides. It contains information about the raw data, how to locate an element and how to interpret an element.

The memory address and strides are important when you dive deeper into the lower-level details of arrays, while the data type and shape are things that beginners should surely know and understand. Two other attributes that you might want to consider are the `data`

and `size`

, which allow you to gather even more information on your array.

Refresh the usage of the `ndarray`

attributes in the following DataCamp Light chunk. The array `myArray`

has already been loaded in. You can inspect it by typing it in into the IPython shell and by pressing ENTER.

You’ll see in the results of the code that is included in the code chunk above that the data type of `myArray`

is `int64`

. When you’re intensively working with arrays, you will definitely **remember** that there are ways to convert your arrays from one data type to another with the `astype()`

method.

Nevertheless, when you’re using SciPy and NumPy together, you might also find the following type handling NumPy functions very useful, especially when you’re working with complex numbers:

Try to add `print()`

calls to see the results of the code that is given above. Then, you’ll see that complex numbers have a real and an imaginary part to them. The `np.real()`

and `np.imag()`

functions are designed to return these parts to the user, respectively.

Alternatively, you might also be able to use `np.cast`

to cast an array object to a different data type, such as float in the example above.

The only thing that really stands out in difficulty in the above code chunk is the `np.real_if_close()`

function. When you give it a complex input, such as `myArray`

, you’ll get a real array back if the complex parts are close to zero. This last part, “close to 0”, can be adjusted by yourself with the `tol`

argument that you can pass to the function. Play around with it the treshold to see what happens!

### Array Creation

You have now seen how to inspect your array and to make adjustments in the data type of it, but you haven’t explicitly seen hwo to create arrays. You should already know that you can use `np.array()`

to to this, but there are other routines for array creation that you should know about: `np.eye()`

and `np.identity()`

.

The `np.eye()`

funcion allows you to create a square matrix with dimensions that are equal to the positive integer that you give as an argument to the function. The entries are generally filled with zeros, only the matrix diagonal is filled with ones. The `np.identity()`

function works and does the same and also returns an identity array.

However, **note** that `np.eye()`

can take an additional argument `k`

that you can specify to pick the index of the diagonal that you want to populate with ones.

Other array creation functions that will most definitely come in handy when you’re working with the matrices for linear algebra are the following:

- The
`np.arange()`

function creates an array with uniformly spaced values between two numbers. You can specify the spacing between the elements, - The latter also holds for
`np.linspace()`

, but with this function you specify the number of elements that you want in your array. - Lastly, the
`np.logspace()`

function also creates arrays with uniformly spaced values, but this time in a logarithmic scale. This means that the spacing is now logarithmical: two numbers are evenly spaced between the logarithms of these two to the base of 10.

These functions create arrays in the form of meshes, which can be used to create mesh grids, about which you read earlier in this post.

**Note** that `numpy`

is already imported as `np`

in the code chunk below!

Now that you have refreshed your memory and you know how to handle the data types of your arrays, it’s time to also tackle the topic of indexing and slicing.

#### Indexing and Slicing

With indexing, you basically use square brackets `[]`

to index the array values. In other words, you can use indexing if you want to gain access to selected elements -a subset- of an array. Slicing your data is very similar to subsetting it, but you can consider it to be a little bit more advanced. When you slice an array, you don’t consider just particular values, but you work with “regions” of data instead of pure “locations”.

Refresh both concepts with the following code examples:

Now that your mind is fresh with slicing and indexing, you might also be interested in some index tricks that can make your work more efficient when you’re going into scientific computing together with SciPy. There are four functions that will definitely come up and these are `np.mgrid()`

, `np.ogrid()`

, `np.r`

and `np.c`

.

You might already know the two last functions if you already have some experience with NumPy. `np.r`

and `np.c`

are often used when you need to stack arrays row-wise or column-wise, respectively. With these functions, you can quickly construct arrays instead of using the `np.concatenate()`

function.

By looking at the two first of these four functions, you might ask yourself why you would need a meshgrid. You can use meshgrids to generate two arrays containing the x- and y-coordinates at each position in a rectilinear grid. The `np.meshgrid()`

function takes two 1D arrays and produces two 2D matrices corresponding to all pairs of (x, y) in the two arrays.

- The
`np.mgrid()`

function is an implementation of MATLAB’s meshgrid and returns arrays that have the same shape. That means that the dimensions and number of the output arrays are equal to the number of indexing dimensions. - The
`np.ogrid()`

function, on the other hand, gives an open meshgrid and isn’t as dense as the result that the`np.mgrid()`

function gives. You can see the visual difference between the two in the code chunk above.

You can also read up on the differences between these two functions here.

Another function that you might be able to use for indexing/slicing purposes is the `np.select()`

function. You can use it to return values from a list of arrays depending on conditions, which you can specify yourself in the first argument of the function. In the second argument, you pass the array that you want to consider for this selection process.

Check out the following example:

Awesome! Now that you have selected the right values of your original array, you can still select the shape and manipulate your new array.

#### Shape Selection and Manipulation

NumPy offers a lot of ways to select and manipulate the shape of your arrays and you’ll probably already know a lot of them. The following section will only give a short overview of the functions that might come in handy, so the post won’t cover all of them.

Now, is there such a thing as functions that are handy when you’re working with SciPy?

Well, the ones that are most useful are the ones that can help you to flatten arrays, stack and split arrays. You have already seen the `np.c`

and `np.r`

functions that you’ll often prefer instead of `np.concatenate()`

, but there are many more that you want to know!

Like `np.hstack()`

to horizontally stack your arrays or `np.vstack()`

to vertically stack your arrays. Similarly, you can use `np.vsplit()`

and `np.hsplit()`

to split your arrays vertically and horizontally. But you’ll probably know all of this already.

Prove it in the following code chunk:

**Remember** that the `np.eye()`

function creates a 2X2 identity array, which is perfect to stack with the 2-D array that has been loaded in for you in the code chunk above.

If you want to know more about the conditions that you need to take into account if you want to stack arrays, go here. However, you can also just look at the arrays and what the functions to do gather an intuition of which ‘rules’ you need to respect if you want to join the two arrays by row or column.

The most important thing that you need to take into account when splitting arrays is probably the shape of your array, because you want to select the correct index at which you want the split to occur.

Besides functions to stack and split arrays, you will also want to keep in mind that you have functions that help to ensure that you’re working with arrays of a certain dimension are indispensable when you’re diving deeper into scientific computing.

Consider the following functions:

**Note** the difference between reshaping and resizing your array. With the first, you change the shape of the data but you don’t change the data itself. When you resize, there is the possibility that the data that is contained within the array will change, depending on the shape that you select, of course.

Besides the splitting and stacking routines and the functions that allow you to further manipulate your arrays, there is also something like “vectorization” that you need to consider. When you apply a function to an array, you usually apply it to each element of the array. Consider, for example, applying `np.cos()`

, `np.sin()`

or `np.tan()`

to an array. You’ll see that it works on all array elements. Now, when you see this, you know that the function is vectorized.

But, when you define functions by yourself, as you will most likely do when you’re getting into scientific computing, you might also want to vectorize them. In those cases, you can call `np.vectorize()`

:

When it comes to other vectorized mathematical functions that you might want to know, you should consider `np.angle()`

to provide the angle of the elements of complex array elements, but also basic trigonometric, exponential or logarithmic functions will come in handy.

## Linear Algebra with SciPy

Now that you know what you need to use both packages to your advantage, it’s time to dig into the topic of this tutorial: linear algebra.

But before you go into how you can use Python, make sure that your workspace is completely ready!

### Install SciPy

Of course you first need to make sure firstly that you have Python installed. Go to this page if you still need to do this :) If you’re working on Windows, make sure that you have added Python to the PATH environment variable. In addition, don’t forget to install a package manager, such as `pip`

, which will ensure that you’re able to use Python’s open-source libraries.

Note that recent versions of Python 3 come with pip, so double check if you have it and if you do, upgrade it before you install any other packages:

` pip install pip --upgrade pip --version`

But just installing a package manager is not enough; You also need to download to download the wheel for the library: go here to get your SciPy wheel. After the download, open up the terminal at the download directory on your pc and install it. Additionally, you can check whether the installation was successful and confirm the package version that you’re running:

`# Install the wheel install "scipy‑0.18.1‑cp36‑cp36m‑win_amd64.whl"# Confirm successful installimport scipy# Check package versionscipy.__version__`

After these steps, you’re ready to goy!

**Tip**: install the package by downloading the Anaconda Python distribution. It’s an easy way to get started quickly, as Anaconda not only includes 100 of the most popular Python, R and Scala packages for data science, but also includes several open course development environments such as Jupyter and Spyder. If you’d like to start working with Jupyter Notebook, check out this Jupyter notebook tutorial.

If you haven’t downloaded it already, go here to get it.

### Vectors and Matrices: The Basics

Now that you have made sure that your workspace is prepped, you can finally get started with linear algebra in Python. In essence, this discipline is occupied with the study of vector spaces and the linear mappings that exist between them. These linear mappings can be described with matrices, which also makes it easier to calculate.

**Remember** that a vector space is a fundamental concept in linear algebra. It’s a space where you have a collection of objects (vectors) and where you can add or scale two vectors without the resulting vector leaving the space. Remember also that vectors are rows (or columns) of a matrix.

But how does this work in Python?

You can easily create a vector with the `np.array()`

function. Similarly, you can give a matrix structure to every one-or two-dimensional `ndarray`

with either the `np.matrix()`

or `np.mat()`

commands.

Try it out in the following code chunk:

So arrays and matrices are the same, besides from the formatting?

Well, not exactly. There are some differences:

- A matrix is 2-D, while arrays are usually
`n`

-D, - As the functions above already implied, the matrix is a subclass of
`ndarray`

, - Both arrays and matrices have
`.T()`

, but only matrices have`.H()`

and`.I()`

, - Matrix multiplication works differently from element-wise array multiplication, and
- To add to this, the
`**`

operation has different results for matrices and arrays

When you’re working with matrices, you might sometimes have some in which most of the elements are zero. These matrices are called “sparse matrices”, while the ones that have mostly non-zero elements are called “dense matrices”.

In itself, this seems trivial, but when you’re working with SciPy for linear algebra, this can sometimes make a difference in the modules that you use to get certain things done. More concretely, you can use `scipy.linalg`

for dense matrices, but when you’re working with sparse matrices, you might also want to consider checking up on the `scipy.sparse`

module, which also contains its own `scipy.sparse.linalg`

.

For sparse matrices, there are quite a number of options to create them. The code chunk below lists some:

Additionally, there are also some other functions that you might be able to use to create sparse matrices: Block Sparse Row matrices with `bsr_matrix()`

, COOrdinate format sparse matrices with `coo_matrix()`

, DIAgonal storage sparse matrices with `dia_matrix()`

, and Row-based linked list sparse matrices with `lil_matrix()`

.

There are really a lot of options, but which one should you choose if you’re making a sparse matrix yourself?

It’s not that hard.

Basically, it boils down to first is how you’re going to initialize it. Next, consider what you want to be doing with your sparse matrix.

More concretely, you can go through the following checklist to decide what type of sparse matrix you want to use:

- If you plan to fill the matrix with numbers one by one, pick a
`coo_matrix()`

or`dok_matrix()`

to create your matrix. - If you want to initialize the matrix with an array as the diagonal, pick
`dia_matrix()`

to initialize your matrix. - For sliced-based matrices, use
`lil_matrix()`

. - If you’re constructing the matrix from blocks of smaller matrices, consider using
`bsr_matrix()`

. - If you want to have fast access to your rows and columns, convert your matrices by using the
`csr_matrix()`

and`csc_matrix()`

functions, respectively. The last two functions are not great to pick when you need to initialize your matrices, but when you’re multiplying, you’ll definitely notice the difference in speed.

Easy peasy!

#### Vector Operations

Now that you have learned or refreshed the difference between vectors, dense matrices and sparse matrices, it’s time to take a closer look at vectors and what kind of mathematical operations you can do with them. The tutorial focuses here explicitly on mathematical operations so that you’ll come to see the similarities and differences with matrices, and because a huge part of linear algebra is, ultimately, working with matrices.

You have already seen that you can easily create a vector with `np.array()`

. But now that you have vectors at your disposal, you might also want to know of some basic operations that can be performed on them. `vector1`

and `vector2`

are already loaded for you in the following code chunk:

Now that you have successfully seen some vector operations, it’s time to get started on to the real matrix work!

#### Matrices: Operations and routines

Similarly to where you left it off at the start of the previous section, you know how to create matrices, but you don’t know yet how you can use them to your advantage. This section will provide you with an overview of some matrix functions and basic matrix routines that you can use to work efficiently.

Firstly, let’s go over some functions. These will come quite easily if you have worked with NumPy before, but even if you don’t have any experience with it yet, you’ll see that these functions are easy to get going.

Let’s look at some examples of functions.

There’s `np.add()`

and `np.subtract()`

to add and subtract arrays or matrices, and also `np.divide()`

and `np.multiply`

for division and multiplication. This really doesn’t seem like a big msytery, does it? Also the `np.dot()`

function that you have seen in the previous section where it was used to calculate the dot product, can also be used with matrices. But don’t forget to pass in two matrices instead of vectors.

These are basic, right?

Let’s go a bit less basic. When it comes to multiplications, there are also some other functions that you can consider, such as `np.vdot()`

for the dot product of vectors, `np.inner()`

or`np.outer()`

for the inner or outer products of arrays, `np.tensordot()`

and `np.kron()`

for the Kronecker product of two arrays:

**Tip**: add print statements to the code chunk above to see the individual product results.

Besides these, it might also be useful to consider some functions of the`linalg`

module: the matrix exponential functions `linalg.expm()`

, `linalg.expm2()`

and `linalg.expm3()`

. The difference between these three lies in the ways that the exponential is calculated. Stick to the first one for a general matrix exponential, but definitely try the three of them out to see the difference in results!

Also trigonometric functions such as `linalg.cosm()`

, `linalg.sinm()`

and `linalg.tanm()`

, hyperbolic trigonometric functions such as `linalg.coshm()`

, `linalg.sinhm()`

and `linalg.tanhm()`

, the sign function `linalg.signm()`

, the matrix logarithm `linalg.logm()`

, and the matrix square root `linalg.sqrtm()`

.

Additionally, you can also evaluate a matrix function with the help of the `linalg.funm()`

function. Check out the example below:

You see that you pass in the matrix to which you want to apply a function as a first argument and a function (in this case a lambda function) that you want to apply to the matrix you passed. Note that the function that you pass to `linalg.funm()`

has to be vectorized.

Let’s now take a look at some basic matrix routines. The first thing that you probably want to check out are the matrix attributes: `T`

for transposition, `H`

for conjugate transposition, `I`

for inverse, and `A`

to cast as an array.

Try them out:

When you tranpose a matrix, you make a new matrix whose rows are the columns of the original. A conjugate transposition, on the other hand, interchanges the row and column index for each matrix element. The inverse of a matrix is a matrix that, if multiplied with the original matrix, results in an identity matrix.

But besides those attributes, there are also real functions that you can use to perform some basic matrix routines, such as `np.transpose()`

and `linalg.inv()`

for transposition and matrix inverse, respectively.

Besides these, you can also retrieve the trace or sum of the elements on the main matrix diagonal with `np.trace()`

. Similarly, you can also retrieve the matrix rank or the number of Singular Value Decomposition singular values of an array that are greater than a certain treshold with `linalg.matrix_rank`

from NumPy.

Don’t worry if the matrix rank doesn’t make sense for now; You’ll see more on that later on in this tutorial.

For now, let’s focus on two more routines that you can use:

- The norm of a matrix can be computed with
`linalg.norm`

: a matrix norm is a number defined in terms of the entries of the matrix. The norm is a useful quantity which can give important information about a matrix because it tells you how large the elements are. - On top of that, you can also calculate the determinant, which is a useful value that can be computed from the elements of a square matrix, with
`linalg.det()`

. The determinant boils down a square matrix to a a single number, which determines whether the square matrix is invertible or not.

Lastly, solving large systems of linear equations are one of the most basic applications of matrices. If you have a system of \(Ax = b\), where \(A\) is a square matrix and \(b\) a general matrix, you have two methods that you can use to find \(x\), depending of course on which type of matrix you’re working with:

To solve sparse matrices, you can use `linalg.spsolve()`

. When you can not solve the equation, it might still be possible to obtain an approximate \(x\) with the help of the `linalg.lstsq()`

command.

**Tip**: don’t miss DataCamp’s SciPy cheat sheet.

Now that you have gotten a clue on how you can create matrices and how you can use them for mathematical operations, it’s time to tackle some more advanced topics that you’ll need to really get into machine learning.

### Eigenvalues and Eigenvectors

The first topic that you will tackle are the eigenvalues and eigenvectors.

Eigenvalues are a new way to see into the heart of a matrix. But before you go more into that, let’s explain first what eigenvectors are. Almost all vectors change direction, when they are multiplied by a matrix. However, certain exceptional, resulting vectors are in the same direction as the vectors that are the result of the multiplication. These are the eigenvectors.

In other words, multiply an eigenvector by a matrix, and the resulting vector of that multiplication is equal to a multiplication of the original eigenvector with \(\lambda\), the eigenvalue: \[Ax = \lambda x. \]

This means that the eigenvalue gives you very valuable information: it tells you whether one of the eigenvectors is stretched, shrunk, reversed, or left unchanged—when it is multiplied by a matrix.

You use the `eig()`

function from the `linalg`

SciPy module to solve ordinary or generalized eigenvalue problems for square matrices.

Note that the `eigvals()`

function is another way of unpacking the eigenvalues of a matrix.

When you’re working with sparse matrices, you can fall back on the module `scipy.sparse`

to provide you with the correct functions to find the eigenvalues and eigenvectors:

`la, v = sparse.linalg.eigs(myMatrix,1)`

**Note** that the code above specifies the number of eigenvalues and eigenvectors that has to be retrieved, namely, 1.

The eigenvalues and eigenvectors are important concepts in many computer vision and machine learning techniques, such as Principal Component Analysis (PCA) for dimensionality reduction and EigenFaces for face recognition.

### Singular Value Decomposition (SVD)

Next, you need to know about SVD if you want to really learn data science. The singular value decomposition of a matrix \(A\) is the decomposition or facorization of \(A\) into the product of three matrices: \(A = U * \Sigma * V^t\).

The size of the individual matrices is as follows if you know that matrix \(A\) is of size \(M\) x \(N\):

- Matrix \(U\) is of size \(M\) x \(M\)
- Matrix \(V\) is of size \(N\) x \(N\)
- Matrix \(\Sigma\) is of size \(M\) x \(N\)

The \(*\) indicates that the matrices are multiplied and the \(^t\) that you see in \(V^t\) means that the matrix is transposed, which means that the rows and columns are interchanged.

Simply stated, singular value decomposition provides a way to break a matrix into simpler, meaningful pieces. These pieces may contain some data we are interested in.

**Note** that for sparse matrices, you can use the `sparse.linalg.svds()`

function to perform the decomposition.

If you’re new to data science, the matrix decomposition will be quite opaque for you: you might not immediately see any use cases to apply this. But SVD is useful in many tasks, such as data compression, noise reduction and data analysis. In the following, you will see how SVD can be used to compress images:

`# Import the necessary packagesimport numpy as npfrom scipy import linalgfrom skimage import dataimport matplotlib.pyplot as plt# Get an image from `skimage`img= data.camera()# Check number of singular valueslinalg.svdvals(img)# Singular Value DecompositionU, s, Vh = linalg.svd(img)# Use only 32 singular valuesA = np.dot(U[:,0:32], np.dot(np.diag(s[0:32]), Vh[0:32,:]))fig = plt.figure(figsize=(8, 3))# Add a subplot to the figureax = fig.add_subplot(121)# Plot `img` on grayscaleax.imshow(img, cmap='gray')# Add a second subplot to the figureax2 = fig.add_subplot(122)# Plot `A` in the second subplotax2.imshow(A)# Add a titlefig.suptitle('Image Compression with SVD', fontsize=14, fontweight='bold')# Show the plotplt.show()`

Which will give you the following result:

Consider also the following examples where SVD is used:

- SVD is closely linked to Principal Component Analysis (PCA), which is used for dimensionality reduction: both result in a set of “new axes” that are constructed from linear combinations of the the feature space axes of your data. These “new axes” break down the variance in the data points based on each direction’s contribution to the variance in the data. To see a concrete example of how PCA works on data, go to our Scikit-Learn Tutorial.
- Another link is one with data mining and natural language processing (NLP): Latent Semantic Indexing (LSI). It s a technique that is used in document retrieval and word similarity. Latent semantic indexing uses SVD to group documents to the concepts that could consist of different words found in those documents. Various words can be grouped into a concept. Also here, SVD reduces the noisy correlation between words and their documents, and it decreases the number of dimensions that the original data has.

You see, SVD is an important concept in your data science journey that you must cover. That’s why you should consider going deeper into SVD than what this tutorial covers: for example, go to this page to read more about this matrix decomposition.

## What’s up Next?

You’ve made it to the end of the tutorial! Where you go on out from here is totally up to you.

But hold up!

Don’t miss out on courses that go deeper into linear algebra: this tutorial has only been an introduction to the topic and hasn’t covered everything!

Of course, also consider taking DataCamp’s Machine Learning tutorial, which will definitely add value to your learning curriculum after going through this Scipy tutorial about linear algebra. But, if you want to go back to the basics, go through our NumPy tutorial or the Intermediate Python for Data Science course.

Learn more about Python

Course

### Introduction to Python

Course

### Introduction to Functions in Python

cheat-sheet

### SciPy Cheat Sheet: Linear Algebra in Python

tutorial

### Python Numpy Array Tutorial

tutorial

### Python Arrays

DataCamp Team

3 min

tutorial

### Machine Learning Basics - The Norms

tutorial

### Python Machine Learning: Scikit-Learn Tutorial

code-along

### NumPy Crash Course

Izzy Weber