Singular Value Decomposition (SVD): What You Need to Know

Singular Value Decomposition (SVD) is a matrix factorization method that breaks any matrix into three simpler components, revealing its underlying structure.

18 mag 2026 · 12 min leggi

Have you ever tried to extract useful patterns from a dataset with thousands of features?

You know that a massive dataset must have some useful structure buried. The problem is, raw datasets carry a lot of noise, redundancy, missing values, and way more dimensions than you actually need. Most machine learning algorithms would fail to understand this kind of data, or at best, slow down the training time.

Singular Value Decomposition (SVD) breaks any matrix (dataset in this case) into three simpler matrices that show its core structure. It's the math behind recommendation systems, image compression, and dimensionality reduction techniques like PCA - and once you understand it, you'll see it everywhere in your day job.

In this article, I'll walk you through what SVD is, how it works, where it's used in data science, and when you should reach for an alternative instead.

Do you find concepts like vectors and determinants confusing? Read our Demystifying Mathematical Concepts for Deep Learning post before continuing with this one.

What Is Singular Value Decomposition (SVD)?

SVD is a method that breaks any matrix into three simpler matrices.

Think of it this way. You have a matrix A - it could be a dataset or an image. SVD splits A into three pieces:

SVD formula

U is an m x m orthogonal matrix. Its columns are called left singular vectors, and they describe the relationships between the rows of A
\Sigma is an m x n diagonal matrix. The values on the diagonal are the singular values - always non-negative and sorted from largest to smallest
V* is the conjugate transpose of an n x n orthogonal matrix. Its rows are called right singular vectors, and they describe the relationships between the columns of A

Each piece shows something different about the original data. U holds the row-level patterns (how rows relate to each other), \Sigma holds the importance weights (how much each pattern matters), and V* holds the column-level patterns (how columns relate to each other).

Here's an analogy. Imagine you're describing a recipe to someone. You could break it down into three parts: the ingredients (what goes in), the proportions (how much of each), and the steps (how they combine). None of these parts alone recreates the dish, but together they give you everything you need to know. SVD does the same thing with matrices - it separates the "what," "how much," and "how" into distinct components you can independently work with.

What makes SVD stand out in linear algebra is that it works on any matrix. It doesn't need to be square, nor does it need special properties. Any m x n matrix can be decomposed this way, which is why it shows up everywhere in data science.

How SVD Works in Practice

Let's take a close look at how SVD works, starting at the beginning.

Explaining matrix decomposition

Say you have a 3×2 matrix A:

Matrix decomposition

SVD decomposes this into U (3×3), \Sigma (3×2), and V* (2×2). The columns of U come from the eigenvectors of A x A^T, and the columns of V come from the eigenvectors of A^T x A. The singular values in \Sigma are the square roots of the eigenvalues from either product.

The good news is that you don't need to compute these by hand. In Python, all you need is one line of code:

import numpy as np

A = np.array([[1, 2], [3, 4], [5, 6]])
U, sigma, Vt = np.linalg.svd(A, full_matrices=True)

Numpy output

The three matrices interact through multiplication. U rotates the data in the row space, \Sigma scales it along each axis, and V* rotates it in the column space. The result is the original matrix A.

Role of singular values

The diagonal values in \Sigma tell you how much each component contributes to the overall matrix.

The first singular value is always the largest - it captures the most dominant pattern in the data. Each subsequent value captures less. If the first few singular values are large and the rest are close to zero, it means most of the information in the matrix is concentrated in just a few components.

This is what makes data compression possible.

You can exclude the small singular values (and their matching columns in U and rows in V*) without losing much information. The result is a lower-rank approximation of the original matrix that's smaller and faster to work with.

The number of non-zero singular values also tells you the rank of the matrix - the number of linearly independent rows or columns. If a 100×50 matrix has only 10 non-zero singular values, it means the data has only 10 independent dimensions. The other 40 are redundant.

Reconstructing the matrix

You can rebuild the original matrix by multiplying the three components back together:

Matrix reconstruction

But what you really want is partial reconstruction. So, instead of using all singular values, you keep only the top k values and their corresponding vectors. This gives you a rank-k approximation of A:

Rank-k matrix approximation

The Eckart-Young theorem guarantees that this rank-k approximation is the closest possible matrix of rank k to the original A (measured by the Frobenius norm). In other words, if you're going to compress a matrix down to k dimensions, SVD gives you the best possible result.

Applications of SVD in Data Science

Once you start looking, SVD shows up in more places than you'd expect.

The idea is to always take a big matrix, keep the parts that matter, and remove the rest. What changes is what "matters" means depending on the problem.

Dimensionality reduction

High-dimensional datasets are hard to work with and interpret. More features mean longer training times and a higher risk of overfitting. SVD prevents this by reducing the number of dimensions.

Here's how, broadly speaking. You decompose your data matrix, look at the singular values, and keep only the top k components. The small singular values represent noise and minor variation, so removing them will barely affect the quality of your data. What you're left with is a compact representation that still has most of the original structure.

This is exactly how Principal Component Analysis (PCA) works. PCA centers the data and then runs SVD on the result. The principal components are the right singular vectors, and the singular values tell you how much variance each component explains.

Recommendation systems

Companies like Netflix and Amazon have massive user-item matrices where most entries are empty. A user rates a few movies out of thousands, so the matrix is sparse. SVD is here to fill in the gaps.

The idea is to decompose the ratings matrix into user preferences and item characteristics. The U matrix represents what each user cares about (genre, pacing, tone), and V* represents what each item offers. The singular values in \Sigma scale these factors by importance. When you multiply them back together, you get predicted ratings for movies a user hasn't seen yet.

In practice, standard SVD doesn't work directly on sparse matrices because it treats missing values as zeros. That's why systems use variations like truncated SVD or matrix factorization methods that only operate on observed entries.

Image compression

A grayscale image is just a matrix of pixel values. SVD can compress it by keeping only the most important singular values.

Say you have a 1000×1000 image. Full SVD gives you 1000 singular values. But if you keep only the top 50, you reconstruct the image with just 50 components instead of 1000. The image will look slightly blurry, but recognizable - and the storage drops from 1,000,000 values to around 100,500 (50 columns of U + 50 singular values + 50 rows of V*).

More singular values mean better image quality but less compression. Fewer values mean smaller files but more loss. You get to pick where that line falls based on your use case.

Performance Considerations and Limitations

The bigger your matrix, the more computational cost you’ll face.

Computational cost

Full SVD on an m x n matrix has a time complexity of O(mn²) (assuming m >= n). For small matrices, that's fine. For a matrix with millions of rows and thousands of columns, it is expensive.

Memory is the other bottleneck. Full SVD produces three dense matrices, and storing all of them at once can go past your available RAM.

The fix is to avoid computing full SVD when you don't need it. Truncated SVD computes only the top k singular values and their vectors, which is much faster. In Python, scipy.sparse.linalg.svds and sklearn.decomposition.TruncatedSVD both do this. Randomized SVD goes even further by using random sampling to approximate the decomposition, and it works well when you only need the dominant components.

Stability and accuracy

SVD is numerically stable in most cases, but it can struggle with some data patterns.

Highly noisy data is one example. If the signal-to-noise ratio is low, the top singular values won't separate from the noise. You'll end up keeping noise in your approximation or reducing signal when you truncate.

Ill-conditioned matrices are another problem. When the ratio between the largest and smallest singular values is huge (a high condition number), small numerical errors during computation are amplified. This can produce unreliable results, especially with floating-point precision limits.

The fix is to inspect your singular values before truncating. Plot them and look for a clear drop-off between signal and noise. If the decay is gradual with no obvious elbow, SVD might not be the best tool for that dataset.

Alternatives to SVD

SVD isn't the only matrix decomposition out there, and it's not always the best pick for every job.

Each alternative I’ll list below solves a specific kind of problem. They're not replacements for SVD because they work under different assumptions and constraints. The right choice, as always, depends on the task you’re trying to do.

Eigendecomposition

Eigendecomposition is most closely related to SVD. It breaks a square matrix into eigenvalues and eigenvectors:

Eigendecomposition formula

Where Q holds the eigenvectors and \Lambda is a diagonal matrix of eigenvalues.

The catch is that it only works on square matrices. If your data matrix is m x n where m != n, eigendecomposition can't work with it directly. SVD works on any matrix shape, which is why it's the more general tool.

For square, symmetric matrices (like covariance matrices), eigendecomposition and SVD produce closely related results. The singular values of a symmetric positive semi-definite matrix are its eigenvalues. So if you're working with covariance matrices in PCA, both methods get you the same results. SVD is just the version that generalizes to non-square cases.

QR decomposition

QR decomposition splits a matrix into an orthogonal matrix Q and an upper triangular matrix R:

QR decomposition formula

It's faster than SVD for certain tasks, especially for solving systems of linear equations and least-squares problems.

The tradeoff is information. QR doesn't give you singular values, so it can't tell you anything about the rank of your matrix or which components carry the most weight. If you need to solve Ax = b and don't care about the underlying structure, QR is a good option. But if you need to understand or compress the data, SVD is the better choice.

Non-negative Matrix Factorization (NMF)

NMF decomposes a matrix into two matrices where all values are non-negative:

NMF formula

This constraint makes NMF a great fit for data that's inherently non-negative (think pixel intensities or word counts). On the other hand, SVD doesn't force this. Its decomposed matrices can have negative values, which sometimes produces components that are hard to interpret.

NMF is especially popular in text mining and topic modeling. Each column of W can represent a topic, and each row of H shows how much of that topic appears in each document. The non-negative constraint means topics are built from additive combinations of words, which makes them easier to read than SVD's mixed-sign components.

The downside is that NMF doesn't guarantee a unique solution, and its results depend on initialization. SVD always produces the same output for the same input.

Randomized SVD

If your matrix is too large for full SVD but you still want singular values, randomized SVD is worth a look. It uses random projections to approximate the top k singular values and vectors without computing the full decomposition. Libraries like scikit-learn (TruncatedSVD) and Facebook’s fbpca implement this approach, and it scales well to matrices with millions of rows.

The table below recaps when to go for each method.

Alternatives to SVD

Other Considerations With SVD

A couple of common things confuse most new data scientists.

The first is misreading singular values. A large singular value means that component explains a lot of variance in the data - it doesn't mean that component is "important" in a domain-specific sense. For example, the dominant singular value in a user-ratings matrix might capture the fact that most people rate popular movies, not any meaningful preference pattern. Always interpret singular values in the context of your data, not just their magnitude.

The second is reaching for SVD when you don't need it. On small datasets (a few hundred rows and a handful of columns), SVD just adds unnecessary complexity. Simple methods like correlation analysis or basic feature selection often do the job faster and with less code. SVD is great when you have high-dimensional data with redundant structure - if your dataset doesn't fit that description, go with simpler methods.

Conclusion

SVD breaks any matrix into three components that show its structure. The singular values tell you which parts of the data matter most, and the left and right singular vectors show you the row and column patterns behind them.

That decomposition is behind many practical tools you use daily. Recommendation systems use it to predict missing ratings. Image compression uses it to reduce file sizes while keeping visual quality. The math behind them is nearly identical, even though the domain is completely different.

But SVD isn't always the right tool. It's expensive on large matrices and can mix signal with noise when singular values don't separate well. Also, it's overkill for small datasets. Alternatives like QR decomposition, eigendecomposition, and NMF each handle specific cases better.

The key is knowing when you should use SVD and when something simpler will do better. And to get that knowledge, enroll in our Machine Learning Scientist in Python track and get job-ready in 2026.