Skip to main content
HomeAbout PythonLearn Python

Introduction to t-SNE

Learn to visualize high-dimensional data in a low-dimensional space using a nonlinear dimensionality reduction technique.
Updated Mar 2023  · 14 min read

In this tutorial, we will delve into the workings of t-SNE, a powerful technique for dimensionality reduction and data visualization. We will compare it with another popular technique, PCA, and demonstrate how to perform both t-SNE and PCA using scikit-learn and plotly express on synthetic and real-world datasets.

image1.png

What is t-SNE ?

t-SNE (t-distributed Stochastic Neighbor Embedding) is an unsupervised non-linear dimensionality reduction technique for data exploration and visualizing high-dimensional data. Non-linear dimensionality reduction means that the algorithm allows us to separate data that cannot be separated by a straight line. 

image2.gif

t-SNE gives you a feel and intuition on how data is arranged in higher dimensions. It is often used to visualize complex datasets into two and three dimensions, allowing us to understand more about underlying patterns and relationships in the data.

Take our Dimensionality Reduction in Python course to learn about exploring high-dimensional data, feature selection, and feature extraction.

t-SNE vs PCA

Both t-SNE and PCA are dimensional reduction techniques that have different mechanisms and work best with different types of data.

PCA (Principal Component Analysis) is a linear technique that works best with data that has a linear structure. It seeks to identify the underlying principal components in the data by projecting onto lower dimensions, minimizing variance, and preserving large pairwise distances. Read our Principal Component Analysis (PCA) tutorial to understand the inner working of the algorithms with R examples. 

But, t-SNE is a nonlinear technique that focuses on preserving the pairwise similarities between data points in a lower-dimensional space. t-SNE is concerned with preserving small pairwise distances whereas, PCA focuses on maintaining large pairwise distances to maximize variance.

In summary, PCA preserves the variance in the data, whereas t-SNE preserves the relationships between data points in a lower-dimensional space, making it quite a good algorithm for visualizing complex high-dimensional data. 

How t-SNE works

The t-SNE algorithm finds the similarity measure between pairs of instances in higher and lower dimensional space. After that, it tries to optimize two similarity measures. It does all of that in three steps. 

  1. t-SNE models a point being selected as a neighbor of another point in both higher and lower dimensions. It starts by calculating a pairwise similarity between all data points in the high-dimensional space using a Gaussian kernel. The points that are far apart have a lower probability of being picked than the points that are close together. 
  2. Then, the algorithm tries to map higher dimensional data points onto lower dimensional space while preserving the pairwise similarities. 
  3. It is achieved by minimizing the divergence between the probability distribution of the original high-dimensional and lower-dimensional. The algorithm uses gradient descent to minimize the divergence. The lower-dimensional embedding is optimized to a stable state.

The optimization process allows the creation of clusters and sub-clusters of similar data points in the lower-dimensional space that are visualized to understand the structure and relationship in the higher-dimensional data. 

t-SNE Python Example

In the Python example, we will generate classification data, perform PCA and t-SNE, and visualize the results. For performing dimensionality reduction, we will use Scikit-Learn, and for visualization, we will use Plotly Express. 

Generating Classification Dataset

We will use Scikit-Learn’s make_classification function to generate synthetic data with 6 features, 1500 samples, and 3 classes. 

After that, we will 3D plot the first three features of the data using the Plotly Express scatter_3d function. 

import plotly.express as px
from sklearn.datasets import make_classification

X, y = make_classification(
    n_features=6,
    n_classes=3,
    n_samples=1500,
    n_informative=2,
    random_state=5,
    n_clusters_per_class=1,
)


fig = px.scatter_3d(x=X[:, 0], y=X[:, 1], z=X[:, 2], color=y, opacity=0.8)
fig.show()

We have a 3D plot of the data; you can also visualize the data in a 2D chart by using the Plotly Express scatter function.

image5.png

Fitting and Transforming PCA

We will now apply the PCA algorithm on the dataset to return two PCA components. The fit_transform learns and transforms the dataset at the same time.  

from sklearn.decomposition import PCA

pca = PCA(n_components=2)
X_pca = pca.fit_transform(X)

t-SNE Visualization Python

We can now visualize the results by displaying two PCA components on a scatter plot. 

  • x: First component
  • y: Second companion
  • color: target variable.

We have also used the update_layout function to add a title and rename the x-axis and y-axis.

fig = px.scatter(x=X_pca[:, 0], y=X_pca[:, 1], color=y)
fig.update_layout(
    title="PCA visualization of Custom Classification dataset",
    xaxis_title="First Principal Component",
    yaxis_title="Second Principal Component",
)
fig.show()

image4.png

Fitting and Transforming t-SNE

Now we will apply the t-SNE algorithm to the dataset and compare the results. 

After fitting and transforming data, we will display Kullback-Leibler (KL) divergence between the high-dimensional probability distribution and the low-dimensional probability distribution. 

Low KL divergence is a sign of better results.

from sklearn.manifold import TSNE

tsne = TSNE(n_components=2, random_state=42)
X_tsne = tsne.fit_transform(X)
tsne.kl_divergence_
1.1169137954711914

t-SNE Visualization Python

Similar to PCA, we will visualize two t-SNE components on a scatter plot. 

fig = px.scatter(x=X_tsne[:, 0], y=X_tsne[:, 1], color=y)
fig.update_layout(
    title="t-SNE visualization of Custom Classification dataset",
    xaxis_title="First t-SNE",
    yaxis_title="Second t-SNE",
)
fig.show()

The result is quite better than PCA. We can clearly see three big clusters. 

image8.png

t-SNE on Customer Churn Dataset

In this section, we will use the real Customer Churn dataset of an Iranian telecom company. The dataset contains information on the customers' activity, such as call failures and subscription length, and a churn label.

Churn means the percentage of customers that stop using a particular service during a given time frame.

Note: Code source and dataset from both examples are available at DataCamp Workspace.

Importing Customer Churn Dataset

We will load the dataset using pandas and display the first three rows.

import pandas as pd

df = pd.read_csv("data/customer_churn.csv")
df.head(3)

image3.png

PCA Dimensionality Reduction

After that, we will:

  • Create features (X) and target (y) using the Churn column.
  • Normalize the features using a standard scaler.
  • Split the dataset into a training and testing set.
  • Apply PCA to the training dataset.
  • Get the score using the testing dataset. The score represents the average log-likelihood of all samples.
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

X = df.drop('Churn', axis=1)
y = df['Churn']

scaler = StandardScaler()
X_norm = scaler.fit_transform(X)

X_train, X_test, y_train, y_test = train_test_split(
    X_norm, y, random_state=13, test_size=0.25, shuffle=True
)

pca = PCA(n_components=2)
X_train_pca = pca.fit_transform(X_train)

pca.score(X_test)
-17.04482851288105

Visualizing PCA

We will now visualize the PCA result using the Plotly Express scatter plot. 

fig = px.scatter(x=X_train_pca[:, 0], y=X_train_pca[:, 1], color=y_train)
fig.update_layout(
    title="PCA visualization of Customer Churn dataset",
    xaxis_title="First Principal Component",
    yaxis_title="Second Principal Component",
)
fig.show()

PCA was not good at creating clusters. The data in the low dimension looks random. It could also mean the features in the dataset are highly skewed, or it does not have a strong correlation structure. 

image6.png

Checking Perplexity vs. Divergence

For the t-SNE algorithm, perplexity is a very important hyperparameter. It controls the effective number of neighbors that each point considers during the dimensionality reduction process. 

We will run a loop to get the KL Divergence metric on various perplexities from 5 to 55 with 5 points gap. After that, we will display the result using the Plotly Express line plot.

import numpy as np

perplexity = np.arange(5, 55, 5)
divergence = []

for i in perplexity:
    model = TSNE(n_components=2, init="pca", perplexity=i)
    reduced = model.fit_transform(X_train)
    divergence.append(model.kl_divergence_)
fig = px.line(x=perplexity, y=divergence, markers=True)
fig.update_layout(xaxis_title="Perplexity Values", yaxis_title="Divergence")
fig.update_traces(line_color="red", line_width=1)
fig.show()

The KL Divergence has become constant after 40 perplexity. So, we will use 40 perplexity in t-SNE algorithm.  

image9.png

t-SNE Dimensionality Reduction

We will now fit t-SNE and transform the data into lower dimensions using 40 perplexity to get the lowest KL Divergence. 

from sklearn.manifold import TSNE

tsne = TSNE(n_components=2,perplexity=40, random_state=42)
X_train_tsne = tsne.fit_transform(X_train)

tsne.kl_divergence_
0.258713960647583

Visualizing t-SNE

We will now use the Plotly Scatter plot to display components and target classes. 

fig = px.scatter(x=X_train_tsne[:, 0], y=X_train_tsne[:, 1], color=y_train)
fig.update_layout(
    title="t-SNE visualization of Customer Churn dataset",
    xaxis_title="First t-SNE",
    yaxis_title="Second t-SNE",
)
fig.show()

As we can see, we have multiple clusters and sub-clusters. We can use this information to understand the pattern and come up with a strategy for retaining existing customers. 

image7.png

Application of t-SNE

Apart from visualizing complex multi-dimensional data, t-SNE has other uses mostly in the medical field. 

  1. Clustering and classification: to cluster similar data points together in lower dimensional space. It can also be used for classification and finding patterns in the data. 
  2. Anomaly detection: to identify outliers and anomalies in the data. 
  3. Natural language processing: to visualize word embeddings generated from a large corpus of text that makes it easier to identify similarities and relationships between words.
  4. Computer security: to visualize network traffic patterns and detect anomalies.
  5. Cancer research: to visualize molecular profiles of tumor samples and identify subtypes of cancer. 
  6. Geological domain interpretation: to visualize seismic attributes and to identify geological anomalies. 
  7. Biomedical signal processing: to visualize electroencephalogram (EEG) and detect patterns of brain activity. 

Conclusion

t-SNE is a powerful visualization tool for revealing hidden patterns and structures in complex datasets. You can use it for images, audio, biologicals, and single data to identify anomalies and patterns. 

In this blog post, we have learned about t-SNE, a popular dimensionality reduction technique that can visualize high-dimensional non-linear data in a low-dimensional space. We have explained the main idea behind t-SNE, how it works, and its applications. Moreover, we showed some examples of applying t-SNE to synthetics and real datasets and how to interpret the results. 

t-SNE is a part of Unsupervised Learning, and the next natural step is to understand hierarchical clustering, PCA, Decorrelating, and discovering interpretable features. Learn all of the topics by taking our Unsupervised Learning in Python course.

Topics

Learn more about Python

Certification available

Course

Unsupervised Learning in Python

4 hr
132.9K
Learn how to cluster, transform, visualize, and extract insights from unlabeled datasets using scikit-learn and scipy.
See DetailsRight Arrow
Start Course
See MoreRight Arrow
Related

Mastering the Pandas .explode() Method: A Comprehensive Guide

Learn all you need to know about the pandas .explode() method, covering single and multiple columns, handling nested data, and common pitfalls with practical Python code examples.
Adel Nehme's photo

Adel Nehme

5 min

Python NaN: 4 Ways to Check for Missing Values in Python

Explore 4 ways to detect NaN values in Python, using NumPy and Pandas. Learn key differences between NaN and None to clean and analyze data efficiently.
Adel Nehme's photo

Adel Nehme

5 min

Seaborn Heatmaps: A Guide to Data Visualization

Learn how to create eye-catching Seaborn heatmaps
Joleen Bothma's photo

Joleen Bothma

9 min

Test-Driven Development in Python: A Beginner's Guide

Dive into test-driven development (TDD) with our comprehensive Python tutorial. Learn how to write robust tests before coding with practical examples.
Amina Edmunds's photo

Amina Edmunds

7 min

Exponents in Python: A Comprehensive Guide for Beginners

Master exponents in Python using various methods, from built-in functions to powerful libraries like NumPy, and leverage them in real-world scenarios to gain a deeper understanding.
Satyam Tripathi's photo

Satyam Tripathi

9 min

Python Linked Lists: Tutorial With Examples

Learn everything you need to know about linked lists: when to use them, their types, and implementation in Python.
Natassha Selvaraj's photo

Natassha Selvaraj

9 min

See MoreSee More