HomeTutorialsData Science

# QR Decomposition in Machine Learning: A Detailed Guide

Learn about QR decomposition, the matrix factorization technique that decomposes matrix A into the product of an orthogonal matrix Q and an upper triangular matrix R.
Aug 2024  · 12 min read

QR decomposition is a fundamental matrix factorization technique widely used in various fields of data science and machine learning. Its ability to break down complex matrices into simpler pieces makes it a much-needed method for solving linear algebra problems and improving the stability of algorithms.

Whether you're performing ordinary least squares linear regression or building recommendation engines in Python or PySpark, understanding QR decomposition can significantly add to your ability to work with large datasets and complex mathematical models.

## What is QR Decomposition?

When you factorize, or decompose, a matrix, you express that matrix as the product of two or more matrices. In other words, matrix factorization allows you to break down a matrix into simpler components, which, when multiplied together using matrix multiplication, will reconstruct the original matrix. We care about matrix factorization because it simplifies many linear algebra problems, making them easier to solve.

A QR decomposition is a specific kind of matrix factorization technique that is used to factorize a matrix into the product of an orthogonal matrix Q and an upper triangular matrix R. The equation we commonly see is A = QR

QR decomposition visualized. Source: ResearchGate

In order to understand QR decomposition, you have to appreciate that in linear algebra, multiplication of matrices doesn't follow the same intuitive rules as scalar multiplication. To be specific, multiplying matrices isn't commutative, meaning AB ≠ BA in general, and therefore, the notion of "reverse" is more complex.

## What are the Q and R Matrices?

Let’s take a look at the two important matrices, Q and R.

### Orthogonal matrix Q

An orthogonal matrix Q is characterized by the property that its transpose is also its inverse, which is mathematically represented as QTQ = QQT  =  I, where I is the identity matrix. Transposing, for context, means flipping the matrix over its diagonal.

An orthogonal matrix is also characterized by the fact that its columns form an orthonormal basis. This means two things: For one, the columns are orthogonal, which is to say that the dot product of any pair of distinct columns in the matrix is zero. And two, each column has a magnitude, or unit length, of one.

Orthogonal matrices are good to work with because they preserve the lengths of vectors and their dot products. So, if you have a vector and you multiply it by an orthogonal matrix, the length of that vector stays the same. Also, if you have two vectors and you multiply them both by the same orthogonal matrix, the dot product between them also stays the same. Finally, because orthogonal matrices preserve lengths and dot products, they don't distort the geometric properties of the vectors after our transformation. That is, if two vectors are perpendicular before the transformation, they will remain perpendicular after applying the orthogonal matrix.

### Upper triangular matrix R

An upper triangular matrix R is a type of matrix where all the elements below the main diagonal are zero. This form simplifies matrix equations and is particularly useful in solving systems of linear equations through back substitution or another method. We could also say that, in QR decomposition, R represents the component that captures the original matrix's magnitude and orientation.

## Ways to Perform QR Decomposition

Let’s take a quick look at some of the different ways to perform QR decomposition.

### The Gram Schmidt process

The Gram Schmidt process is an algorithm that is used to systematically convert a set of linearly independent vectors into an orthonormal set. This process is used specifically for constructing the Q matrix in our QR decomposition and, after we have Q, we can derive R

### Givens rotations

Givens rotations is another method to help with QR decomposition. With Givens rotations, a series of plane rotations is applied to introduce zeros below the diagonal of the matrix A. Each rotation acts on a pair of coordinates, preserving the length of the vector while transforming the matrix into an upper triangular form. This method is efficient for sparse matrices, as it only modifies a small portion of the matrix at each step.

### Householder reflections

Householder reflections are a more general technique that uses reflection matrices, known as Householder matrices, to systematically eliminate the sub-diagonal elements of matrix A. Each Householder matrix reflects a vector across a plane orthogonal to the vector chosen to zero out a column below the diagonal. The result is an orthogonal matrix Q, and an upper triangular matrix R. Householder reflections are particularly good for dense matrices because of their numerical stability.

## Implementing QR Decomposition in R

In this section, we will practice QR decomposition in R by utilizing both the built-in `qr()` function for a quick implementation and then walking through the process from scratch using the Gram Schmidt process to build our intuition. Let’s start by creating `matrix_A`, which we will use in both sections.

``````matrix_A <- matrix(c(21, 21, 22.8, 21.4, 18.7, 18.1, 14.3, 24.4, 22.8, 19.2, 17.8, 16.4, 17.3, 15.2, 10.4, 10.4, 14.7, 32.4, 30.4, 33.9, 21.5, 15.5, 15.2, 13.3, 19.2, 27.3, 26, 30.4, 15.8, 19.7, 15, 21.4,
2.62, 2.875, 2.32, 3.215, 3.44, 3.46, 3.57, 3.19, 3.15, 3.44, 3.44, 4.07, 3.73, 3.78, 5.25, 5.424, 5.345, 2.2, 1.615, 1.835, 2.465, 3.52, 3.435, 3.84, 3.845, 1.935, 2.14, 1.513, 3.17, 2.77, 3.57, 2.78,
110, 110, 93, 110, 175, 105, 245, 62, 95, 123, 123, 180, 180, 180, 205, 215, 230, 66, 52, 65, 97, 150, 150, 245, 175, 66, 91, 113, 264, 175, 335, 109),
nrow = 32, ncol = 3, byrow = FALSE)

colnames(matrix_A) <- c("column_1", "column_2", "column_3")``````

### QR decomposition using the qr() function

We can do this factorization in one step with the `qr()` function. We assign the result to a `qr_decomp` object, which is a list containing the QR decomposition.

``````# Perform and Display the QR decomposition
(qr_decomp <- qr(matrix_A))``````

When we access the rank component of the QR decomposition object we can see the number of linearly independent columns in our decomposed matrix. The output indicates that the matrix has a rank of three, meaning there are three linearly independent columns, which is what we expect. When we access the pivot component, we can see information about the permutation of columns used during the decomposition process. The output indicates that the columns of the matrix were not permuted during the decomposition, as the pivot indices match the original column order.

``````qr_decomp\$rank
qr_decomp\$pivot``````
``````[1] 3
[1] 1 2 3``````

As a next step, we can access the Q matrix and the R matrix using the `qr.Q()` and `qr.R()` functions, respectively. We can see in the output that the Q matrix is rectangular and the R matrix is square with zeros below the diagonal.

``````# Extract and Display the Q and R matrices
(Q_matrix <- qr.Q(qr_decomp))
(R_matrix <- qr.R(qr_decomp))``````
``````           [,1]         [,2]         [,3]
[1,] -0.17721481  0.023462372 -0.015872336
[2,] -0.17721481 -0.001889138 -0.060272525
[3,] -0.19240465  0.077625112 -0.019039920
[4,] -0.18059033 -0.030282834 -0.118470211
[5,] -0.15780557 -0.089157964  0.064671266
[6,] -0.15274229 -0.099258795 -0.187022354
[7,] -0.12067485 -0.161573766  0.277715036
[8,] -0.20590673  0.012764996 -0.275768785
[9,] -0.19240465 -0.004891570 -0.156509401
[10,] -0.16202497 -0.082397566 -0.117343880
[11,] -0.15021065 -0.101326680 -0.120852866
[12,] -0.13839633 -0.182888939 -0.033166125
[13,] -0.14599125 -0.136918208  0.028289904
[14,] -0.12826977 -0.170282764  0.014320506
[15,] -0.08776353 -0.381326588 -0.165554550
[16,] -0.08776353 -0.398625266 -0.160607236
[17,] -0.12405037 -0.332631845 -0.083208412
[18,] -0.27341714  0.219354877 -0.069242665
[19,] -0.25653954  0.250472633 -0.021737724
[20,] -0.28607534  0.275923528 -0.005454217
[21,] -0.18143421  0.045632512 -0.033447865
[22,] -0.13080141 -0.140377926 -0.045388530
[23,] -0.12826977 -0.135983661 -0.031340393
[24,] -0.11223605 -0.201937338  0.228196652
[25,] -0.16202497 -0.122661730 -0.004593473
[26,] -0.23037925  0.176744505 -0.035884026
[27,] -0.21940881  0.138786843  0.013273146
[28,] -0.25653954  0.260613237  0.211010230
[29,] -0.13333305 -0.101525496  0.418085457
[30,] -0.16624437 -0.009027316  0.183837006
[31,] -0.12658201 -0.152109209  0.596664759
[32,] -0.18059033  0.012963861 -0.046253102``````
``````      column_1  column_2  column_3
[1,] -118.5003 -16.11602 -711.9200
[2,]    0.0000 -10.05857 -496.9319
[3,]    0.0000   0.00000  283.7369``````

As a final step, because we are curious, we can check which method the `qr()` function uses for decomposition.

``?qr``

We can see the following as part of the output:

``The function computes the QR decomposition of a matrix using the LINPACK or LAPACK routine dgeqrf.``

If we were to check FORTRAN documentation, we would see that `dgeqrf` is a LAPACK routine that uses Householder transformations.

### QR decomposition using the Gram Schmidt process

Let’s try to understand what is happening with the `qr()` function by recreating not the same implementation but a similar idea ourselves by following the Gram Schmidt process and thinking. The purpose of this exercise is not to recreate an exact method but rather to build our understanding.

#### Constructing the Q matrix

In order to start creating our Q matrix using the Gram Schmidt process, we need to normalize the first column of the subset data to obtain the first column of Q. The goal here is to transform our set of vectors in matrix A into a new set vectors with a unit length of one, and to make sure these vectors are orthogonal to each other.

To find the first column of Q, we extract the first column of A, calculate the Euclidean norm, then perform normalization.

To review, the Euclidean norm is a measure of the length or magnitude of a vector in Euclidean space. We consider the distance from the origin to the end of the vector when considering each vector element as a coordinate in multi-dimensional space. For a vector v = [v1​,v2​,…,vn​], its Euclidean norm is calculated in the following way:

So, to implement in R, we square each column, take the sum, then the square root, and, finally, we normalize.

``````# Extract the first column of the subset data
col_1 <- matrix_A[, 1]

# Calculate the Euclidean norm of the first column
norm_col_1 <- sqrt(sum(col_1^2))

# Normalize and display the first column to get the first column of Q
(q1 <- col_1 / norm_col_1)``````

The second thing we need to review from linear algebra is the concept of a projection. A projection refers to a mathematical operation that projects a vector onto a subspace. The subspace can be thought of as a lower-dimensional space within the larger vector space. The resulting projection is a vector that lies within the subspace and is closest to the original vector being projected, which is also orthogonal to the subspace.

Let’s try to visualize with the following graph, where the projection p is the point closest to a from b, which must be perpendicular to b.

Illustration of a projection. Image by Author.

In the next step of QR factorization, we calculate the second orthogonal vector by first calculating the projection of the second column of A onto the first column of Q, and then subtracting that result from the second column of A. The resulting vector will be orthogonal to the first vector. As a final step, we normalize this vector to obtain the second column of Q.

Let’s review the equation for a projection. The projection is the matrix that is a times a-transpose over the scalar value that is a-transpose times a, multiplied by b.

Or, alternatively:

We might be wondering: How are these two methods the same? Well, when we look at the first equation above, the part to the left of b computes a matrix that essentially captures the direction defined by vector a and the scaling factor for that direction. This is similar to calculating the unit vector of a in the dot product method.

The following R code calculates the projection of the vector `col_2` onto the vector `q1` using the dot product method.

``````# Extract the second column of the subset data
col_2 <- matrix_A[, 2]

# Calculate the projection of col_2 onto q1
projection_col_2_q1 <- sum(col_2 * q1) * q1

# Calculate the second orthogonal vector
v2 <- col_2 - projection_col_2_q1

# Calculate the Euclidean norm of the second orthogonal vector
norm_v2 <- sqrt(sum(v2^2))

# Normalize the second orthogonal vector to get the second column of Q
q2 <- v2 / norm_v2``````

As we can see in the code above, constructing the second orthogonal vector and the corresponding column of the matrix Q involves a process of projection, subtraction, and normalization. The basic steps are similar to creating the first column of Q, but they are a bit different and slightly more involved: We first extract the second column of A; we then create a projection onto the first vector of Q; we then subtract the projection from the second column of A; we then perform normalization; finally, we create the second column of Q.

We can repeat this second step for `q3` all the way to the last column. In our case, the orthogonal vectors `q1`, `q2`, and `q3` ​form the basis of the orthogonal matrix Q. These vectors are essential because they provide the foundation for expressing the original data in a new coordinate system.

Here, we see that the Gram Schmidt process is an iterative process that ensures that each successive vector is orthogonal to the previous ones. We also remember that orthogonality means that the dot product of two vectors is zero, indicating that they are perpendicular to each other in the vector space. Let’s confirm the orthogonality of `q1` and `q2` using their dot product:

``````# Calculate the dot product of q1 and q2
dot_product_q1_q2 <- sum(q1 * q2)

# Print the dot product to confirm orthogonality
print(dot_product_q1_q2)``````
``[1] 2.43295e-16``

#### Constructing the R matrix

Now that we have our Q matrix, we need to construct the R matrix. The R matrix in QR decomposition contains information about the relationships between the columns of the orthogonal matrix Q and the original dataset. It is an upper triangular matrix, and each element of the upper triangular part of R is obtained by calculating the dot product of the corresponding columns of Q and the original data.

To construct the upper triangular matrix R, we need to calculate the dot products of each column of Q with each column of the original dataset. This will result in a matrix that has the same number of rows as the number of columns in Q, and it will be upper triangular due to our previous process.

``````matrix_A <- matrix_A %>% as.matrix()

# Calculate and display the R matrix by multiplying Q^T and the subset data
R_matrix <- t(Q_matrix) %*% matrix_A``````

We see that the output of our R matrix looks a bit different this time. The reason is because the `qr()` function uses Householder reflections with more numerical stability than the Gram Schmidt-inspired process that we created ourselves.

``````         column_1      column_2  column_3
[1,] -1.185003e+02 -1.611602e+01 -711.9200
[2,]  5.814446e-16 -1.005857e+01 -496.9319
[3,]  7.633287e-15  1.338669e-15  283.7369``````

We can verify the reconstruction by multiplying Q and R and seeing that we get back `matrix_A`. Here, `%*%` is for matrix multiplication in R.

``Q_matrix %*% R_matrix``
``````   column_1 column_2 column_3
[1,]     21.0    2.620      110
[2,]     21.0    2.875      110
[3,]     22.8    2.320       93
[4,]     21.4    3.215      110
[5,]     18.7    3.440      175
[6,]     18.1    3.460      105
[7,]     14.3    3.570      245
[8,]     24.4    3.190       62
[9,]     22.8    3.150       95
[10,]     19.2    3.440      123
[11,]     17.8    3.440      123
[12,]     16.4    4.070      180
[13,]     17.3    3.730      180
[14,]     15.2    3.780      180
[15,]     10.4    5.250      205
[16,]     10.4    5.424      215
[17,]     14.7    5.345      230
[18,]     32.4    2.200       66
[19,]     30.4    1.615       52
[20,]     33.9    1.835       65
[21,]     21.5    2.465       97
[22,]     15.5    3.520      150
[23,]     15.2    3.435      150
[24,]     13.3    3.840      245
[25,]     19.2    3.845      175
[26,]     27.3    1.935       66
[27,]     26.0    2.140       91
[28,]     30.4    1.513      113
[29,]     15.8    3.170      264
[30,]     19.7    2.770      175
[31,]     15.0    3.570      335
[32,]     21.4    2.780      109``````

## Learn R for Machine Learning

Master core R skills to become a machine learning scientist

## QR Decomposition in Linear Regression

QR decomposition is used by the `lm()` function every time it is implemented in R. The reason is because QR decomposition is a numerically stable and efficient method for solving linear systems, especially when dealing with large datasets or when the predictor variables, by which we really mean the columns of the design matrix, are nearly collinear.

We can see how `lm()` uses QR decomposition by simply typing `lm` into the console and looking at the function signature where it says `method = "qr"` and `qr = TRUE`.

``lm``
``````lm <- function (formula, data, subset, weights, na.action,
method = "qr", model = TRUE, x = FALSE, y = FALSE,
qr = TRUE, singular.ok = TRUE, contrasts = NULL,
offset, ...)``````

There are alternative methods to find the coefficients of a linear regression. We could, for example, find the linear regression coefficients by solving what is known as the normal equation. We could even consider using calculus by solving the partial derivatives of the sum of squared errors with respect to our coefficients and setting them to zero. Or, in the case of the simple linear regression, meaning regression with only one independent and one dependent variable, we could find the slope of the regression line by multiplying the correlation r by the quotient of standard deviation of y over standard deviation of x

Of all these methods, however, QR decomposition might be the best because it strikes a balance between stability and efficiency. The normal equation in particular, which is ATA x = ATB, is computationally very expensive. The reason is because, in order to solve for x, we write the equation as x = ((ATA)-1 ) ATB. The part of this equation that requires us to find the inverse of ATA has a time complexity of O(n3) which means that, as the matrix doubles in size, the time required for inversion increases by a factor of eight. This makes it a bad choice for high-dimensional problems.

## Final Thoughts

QR decomposition is not just a concept local to linear algebra. It is a method with broad uses in data science and machine learning. By understanding and applying QR decomposition, you can increase your ability to deal with complex problems, as when dealing with large datasets and linear regression estimators.

Also, in the event you find decomposition interesting, I want to add that decomposition is actually a more general idea in data science. For example, time series decomposition means breaking a time series into a trend-cycle, seasonality, and a remainder component. When added or multiplied together, these components bring back the original time series. Another example is the Fourier Transform, which decomposes an image into its frequency components, which can be useful in signal processing

As you continue to explore advanced topics in data science, you’ll find that concepts like QR decomposition are closely tied to other foundational areas. Take our courses: Linear Algebra for Data Science in R and Introduction to Statistics. Mastering these interconnected subjects will help you tackle even more challenging and varied data science problems.

## Get Certified in Data Science

Validate your professional data scientist skills.

Author
Josef Waples
Topics

Learn with DataCamp

Course

### .css-1531qan{-webkit-text-decoration:none;text-decoration:none;color:inherit;}Linear Algebra for Data Science in R

4 hr
14.7K
This course is an introduction to linear algebra, one of the most important mathematical topics underpinning data science.
See Details
Start Course

Course

### Introduction to Statistics in R

4 hr
90.8K
Grow your statistical skills and learn how to collect, analyze, and draw accurate conclusions from data.

Track

### Machine Learning Scientist

65hrs hr
A machine learning scientist researches new approaches and builds machine learning models.
See More
Related

tutorial

### Matrices in R Tutorial

Learn all about R's matrix, naming rows and columns, accessing elements also with computation like addition, subtraction, multiplication, and division.

Olivia Smith

7 min

tutorial

### Factor Levels in R

This tutorial takes course material from DataCamp's free Intro to R course and allows you to practice Factors.

Ryan Sheehy

6 min

tutorial

### Decision Trees in Machine Learning Using R

A comprehensive guide to building, visualizing, and interpreting decision tree models with R.

Arunn Thevapalan

27 min

tutorial

### Principal Component Analysis in R Tutorial

In this tutorial, you'll learn how to use R PCA (Principal Component Analysis) to extract data with many variables and create visualizations to display that data.

Zoumana Keita

15 min

tutorial

### Factors in R Tutorial

Learn about the factor function in R, along with an example, and it's structure, order levels, renaming of the levels, and finally, with the ordering of categorical values.

Olivia Smith

5 min

tutorial

### Mastering Data Structures in the R Programming Language

Read our comprehensive guide on how to work with data structures in R programming: vectors, lists, arrays, matrices, factors, and data frames.

Vikash Singh

6 min

See MoreSee More