Перейти к основному контенту

Главная R

Курс

Introduction to Anomaly Detection in R

Средний уровеньУровень навыков

Обновлено 09.2024

Learn statistical tests for identifying outliers and how to use sophisticated anomaly scoring algorithms.

Начать курс бесплатно

RProbability & Statistics

4 ч

13 видео

47 Упражнений

3,900 XP

7,338

Справка об успешном завершении

Создать бесплатный аккаунт

Продолжить через Google Показать больше вариантов

или

Продолжая, вы принимаете наши Условия использования, Политику конфиденциальности и соглашаетесь с хранением ваших данных в США.

Любимая обучающимися из тысяч компаний

Обучаете команду?

Попробуйте для бизнеса

Описание курса

Are you concerned about inaccurate or suspicious records in your data, but not sure where to start? An anomaly detection algorithm could help! Anomaly detection is a collection of techniques designed to identify unusual data points, and are crucial for detecting fraud and for protecting computer networks from malicious activity. In this course, you'll explore statistical tests for identifying outliers, and learn to use sophisticated anomaly scoring algorithms like the local outlier factor and isolation forest. You'll apply anomaly detection algorithms to identify unusual wines in the UCI Wine quality dataset and also to detect cases of thyroid disease from abnormal hormone measurements.

Необходимые условия

1

Statistical outlier detection

In this chapter, you'll learn how numerical and graphical summaries can be used to informally assess whether data contain unusual points. You'll use a statistical procedure called Grubbs' test to check whether a point is an outlier, and learn about the Seasonal-Hybrid ESD algorithm, which can help identify outliers when the data are a time series.

What do we mean when we talk about anomalies?

Recognizing anomaly types

Exploring the river nitrate data

Testing the extremes with Grubbs' test

Visual check of normality

Grubbs' test

Hunting multiple outliers using Grubbs' test

Anomalies in time series

Visual assessment of seasonality

Seasonal Hybrid ESD algorithm

Interpreting Seasonal-Hybrid ESD output

Seasonal-Hybrid ESD versus Grubbs' test

Начать главу

2

Distance and density based anomaly detection

In this chapter, you'll learn how to calculate the k-nearest neighbors distance and the local outlier factor, which are used to construct continuous anomaly scores for each data point when the data have multiple features. You'll learn the difference between local and global anomalies and how the two algorithms can help in each case.

k-nearest neighbors distance score

Exploring wine

kNN distance matrix

kNN distance score

Visualizing kNN distance

Standardizing features

Appending the kNN score

Visualizing kNN distance score

Local outlier factor

LOF calculation

LOF visualization

Начать главу

3

Isolation forest

k-nearest neighbors distance and local outlier factor use the distance or relative density of the nearest neighbors to score each point. In this chapter, you'll explore an alternative tree-based approach called an isolation forest, which is a fast and robust method of detecting anomalies that measures how easily points can be separated by randomly splitting the data into smaller and smaller regions.

Isolation trees

Fit and predict with an isolation tree

Score interpretation

Isolation forest

Fit an isolation forest

Checking convergence

Visualizing the isolation score

A grid of points

Prediction over a grid

Anomaly contours

Начать главу

4

Comparing performance

You've now been introduced to a few different algorithms for anomaly scoring. In this final chapter, you'll learn to compare the detection performance of the algorithms in instances where labeled anomalies are available. You'll learn to calculate and interpret the precision and recall statistics for an anomaly score, and how to adapt the algorithms so they can accommodate data with categorical features.

Labeled anomalies

Thyroid data

Visualizing thyroid disease

Anomaly score

Measuring performance

Binarized scores

Cross-tabulate binary scores

Thyroid precision and recall

Working with categorical features

Converting character to factor

Isolation forest with factors

LOF with factors

Начать главу

Introduction to Anomaly Detection in R

Курс
завершён

Получить сертификат об окончании

Добавьте эту квалификацию в профиль LinkedIn, резюме или CV
Поделитесь в социальных сетях и в обзоре эффективностиЗаписаться сейчас

Присоединяйтесь к более чем 19 миллионам обучающихся и начните Introduction to Anomaly Detection in R уже сегодня!

Создать бесплатный аккаунт

Продолжить через Google Показать больше вариантов

или

Продолжая, вы принимаете наши Условия использования, Политику конфиденциальности и соглашаетесь с хранением ваших данных в США.

Развивайте свои навыки работы с данными с помощью DataCamp для мобильных устройств.

Успевайте в обучении на ходу с помощью наших мобильных курсов и ежедневных 5-минутных заданий по программированию.