Home RIntroduction to Anomaly Detection in R

Introduction to Anomaly Detection in R

Learn statistical tests for identifying outliers and how to use sophisticated anomaly scoring algorithms.

Start Course for Free

4 Hours13 Videos47 Exercises

6,795 LearnersStatement of Accomplishment

Create Your Free Account

Google LinkedIn Facebook

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.

Training 2 or more people?Try DataCamp For Business

Loved by learners at thousands of companies

Course Description

Are you concerned about inaccurate or suspicious records in your data, but not sure where to start? An anomaly detection algorithm could help! Anomaly detection is a collection of techniques designed to identify unusual data points, and are crucial for detecting fraud and for protecting computer networks from malicious activity. In this course, you'll explore statistical tests for identifying outliers, and learn to use sophisticated anomaly scoring algorithms like the local outlier factor and isolation forest. You'll apply anomaly detection algorithms to identify unusual wines in the UCI Wine quality dataset and also to detect cases of thyroid disease from abnormal hormone measurements.

For Business

Training 2 or more people?

Get your team access to the full DataCamp library, with centralized reporting, assignments, projects and more

1
Statistical outlier detection
Free
In this chapter, you'll learn how numerical and graphical summaries can be used to informally assess whether data contain unusual points. You'll use a statistical procedure called Grubbs' test to check whether a point is an outlier, and learn about the Seasonal-Hybrid ESD algorithm, which can help identify outliers when the data are a time series.
Play Chapter Now
What do we mean when we talk about anomalies?
50 xp
Recognizing anomaly types
50 xp
Exploring the river nitrate data
100 xp
Testing the extremes with Grubbs' test
50 xp
Visual check of normality
100 xp
Grubbs' test
100 xp
Hunting multiple outliers using Grubbs' test
100 xp
Anomalies in time series
50 xp
Visual assessment of seasonality
100 xp
Seasonal Hybrid ESD algorithm
100 xp
Interpreting Seasonal-Hybrid ESD output
100 xp
Seasonal-Hybrid ESD versus Grubbs' test
50 xp
2
Distance and density based anomaly detection
In this chapter, you'll learn how to calculate the k-nearest neighbors distance and the local outlier factor, which are used to construct continuous anomaly scores for each data point when the data have multiple features. You'll learn the difference between local and global anomalies and how the two algorithms can help in each case.
Play Chapter Now
k-nearest neighbors distance score
50 xp
Exploring wine
100 xp
kNN distance matrix
100 xp
kNN distance score
100 xp
Visualizing kNN distance
50 xp
Standardizing features
100 xp
Appending the kNN score
100 xp
Visualizing kNN distance score
100 xp
Local outlier factor
50 xp
LOF calculation
100 xp
LOF visualization
100 xp
LOF vs kNN
100 xp
3
Isolation forest
k-nearest neighbors distance and local outlier factor use the distance or relative density of the nearest neighbors to score each point. In this chapter, you'll explore an alternative tree-based approach called an isolation forest, which is a fast and robust method of detecting anomalies that measures how easily points can be separated by randomly splitting the data into smaller and smaller regions.
Play Chapter Now
Isolation trees
50 xp
Fit and predict with an isolation tree
100 xp
Score interpretation
50 xp
Isolation forest
50 xp
Fit an isolation forest
100 xp
Checking convergence
100 xp
Visualizing the isolation score
50 xp
A grid of points
100 xp
Prediction over a grid
100 xp
Anomaly contours
100 xp
4
Comparing performance
You've now been introduced to a few different algorithms for anomaly scoring. In this final chapter, you'll learn to compare the detection performance of the algorithms in instances where labeled anomalies are available. You'll learn to calculate and interpret the precision and recall statistics for an anomaly score, and how to adapt the algorithms so they can accommodate data with categorical features.
Play Chapter Now
Labeled anomalies
50 xp
Thyroid data
100 xp
Visualizing thyroid disease
100 xp
Anomaly score
100 xp
Measuring performance
50 xp
Binarized scores
100 xp
Cross-tabulate binary scores
100 xp
Thyroid precision and recall
100 xp
Working with categorical features
50 xp
Converting character to factor
100 xp
Isolation forest with factors
100 xp
LOF with factors
100 xp
Wrap-up
50 xp

For Business

Training 2 or more people?

Get your team access to the full DataCamp library, with centralized reporting, assignments, projects and more

Datasets

Furniture Wine Thyroid

Collaborators

Chester Ismay

Amy Peterson

Prerequisites

Intermediate R

DataCamp Content Creator

Course Instructor

DataCamp offers interactive R, Python, Spreadsheets, SQL and shell courses. All on topics in data science, statistics, and machine learning. Learn from a team of expert teachers in the comfort of your browser with video lessons and fun coding challenges and projects.

What do other learners have to say?

FAQs

Join over 13 million learners and start Introduction to Anomaly Detection in R today!

Create Your Free Account

Google LinkedIn Facebook

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.

Introduction to Anomaly Detection in R

Create Your Free Account

Loved by learners at thousands of companies

Course Description

Training 2 or more people?

Statistical outlier detection

Distance and density based anomaly detection

Isolation forest

Comparing performance

Training 2 or more people?

What do other learners have to say?

FAQs

Is this course suitable for beginners?

Will I receive a certificate at the end of the course?

Who will benefit from this course?

What kind of anomaly detection algorithms will be discussed?

What type of data will be discussed?

What type of summaries will be used to find outliers?

What is the difference between local and global anomalies?

How will the algorithms be compared?

Join over 13 million learners and start Introduction to Anomaly Detection in R today!

Create Your Free Account

Course Description

.css-1goj2uy{margin-right:8px;}Group.css-gnv7tt{font-size:20px;font-weight:700;white-space:nowrap;}.css-12nwtlk{box-sizing:border-box;margin:0;min-width:0;color:#05192D;font-size:16px;line-height:1.5;font-size:20px;font-weight:700;white-space:nowrap;}Training 2 or more people?

Statistical outlier detection

Distance and density based anomaly detection

Isolation forest

Comparing performance

GroupTraining 2 or more people?

What do other learners have to say?

FAQs

Who will benefit from this course?

What kind of anomaly detection algorithms will be discussed?

What type of data will be discussed?

What type of summaries will be used to find outliers?

What is the difference between local and global anomalies?

How will the algorithms be compared?

Join over .css-ou6dz6{color:#03ef62;}13 million learners and start Introduction to Anomaly Detection in R today!

Create Your Free Account

Training 2 or more people?

Training 2 or more people?

Join over 13 million learners and start Introduction to Anomaly Detection in R today!