メインコンテンツへスキップ

コース

Rで学ぶ異常検知入門

中級スキルレベル

更新日 2024/09

外れ値を見つける統計的検定と、高度な異常スコアリング手法の使い方を学びます。

コースを無料で開始

RProbability & Statistics

4時間

13 ビデオ

47 演習

3,900 XP

7,336

修了証明書

何千もの企業の従業員が支持

チームのトレーニングを担当していますか？

Businessをお試しください

コース説明

データに不正確または疑わしい記録が含まれていないか心配だけれど、どこから始めればよいかわからない—そんなときに役立つのが異常検知アルゴリズムです。異常検知は、通常とは異なるデータ点を特定するための手法の総称で、不正検出やコンピュータネットワークの悪意ある活動からの保護に不可欠です。本コースでは、外れ値を特定するための統計的検定を学び、local outlier factor や isolation forest のような高度な異常スコアリング手法を使えるようになります。UCI Wine quality データセットで異常なワインを見つけたり、ホルモン測定値の異常から甲状腺疾患の疑い例を検出したりする課題に、異常検知アルゴリズムを適用していきます。

前提条件

1

Statistical outlier detection

In this chapter, you'll learn how numerical and graphical summaries can be used to informally assess whether data contain unusual points. You'll use a statistical procedure called Grubbs' test to check whether a point is an outlier, and learn about the Seasonal-Hybrid ESD algorithm, which can help identify outliers when the data are a time series.

What do we mean when we talk about anomalies?

Recognizing anomaly types

Exploring the river nitrate data

Testing the extremes with Grubbs' test

Visual check of normality

Grubbs' test

Hunting multiple outliers using Grubbs' test

Anomalies in time series

Visual assessment of seasonality

Seasonal Hybrid ESD algorithm

Interpreting Seasonal-Hybrid ESD output

Seasonal-Hybrid ESD versus Grubbs' test

チャプターを開始

2

Distance and density based anomaly detection

In this chapter, you'll learn how to calculate the k-nearest neighbors distance and the local outlier factor, which are used to construct continuous anomaly scores for each data point when the data have multiple features. You'll learn the difference between local and global anomalies and how the two algorithms can help in each case.

k-nearest neighbors distance score

Exploring wine

kNN distance matrix

kNN distance score

Visualizing kNN distance

Standardizing features

Appending the kNN score

Visualizing kNN distance score

Local outlier factor

LOF calculation

LOF visualization

チャプターを開始

3

Isolation forest

k-nearest neighbors distance and local outlier factor use the distance or relative density of the nearest neighbors to score each point. In this chapter, you'll explore an alternative tree-based approach called an isolation forest, which is a fast and robust method of detecting anomalies that measures how easily points can be separated by randomly splitting the data into smaller and smaller regions.

Isolation trees

Fit and predict with an isolation tree

Score interpretation

Isolation forest

Fit an isolation forest

Checking convergence

Visualizing the isolation score

A grid of points

Prediction over a grid

Anomaly contours

チャプターを開始

4

Comparing performance

You've now been introduced to a few different algorithms for anomaly scoring. In this final chapter, you'll learn to compare the detection performance of the algorithms in instances where labeled anomalies are available. You'll learn to calculate and interpret the precision and recall statistics for an anomaly score, and how to adapt the algorithms so they can accommodate data with categorical features.

Labeled anomalies

Thyroid data

Visualizing thyroid disease

Anomaly score

Measuring performance

Binarized scores

Cross-tabulate binary scores

Thyroid precision and recall

Working with categorical features

Converting character to factor

Isolation forest with factors

LOF with factors

チャプターを開始

Rで学ぶ異常検知入門

コース完了

修了証明書を取得

この修了書をLinkedInや履歴書、CVに追加しましょう
ソーシャルメディアや人事評価で共有しましょう今すぐ登録

19百万人を超える学習者と共にRで学ぶ異常検知入門を始めましょう！

DataCamp for Mobileでデータスキルを磨きましょう

モバイルコースと毎日の 5 分間のコーディングチャレンジで、外出先でも進歩できます。