본문으로 바로가기

강의

R로 시작하는 이상치 탐지 입문

중급기술 수준

업데이트됨 2024. 9.

통계적 검정을 통해 이상값을 식별하는 방법과 정교한 이상치 점수화 알고리즘을 활용하는 방법을 학습하십시오.

무료로 강의 시작

RProbability & Statistics

4시간

13 동영상

47 연습 문제

3,900 XP

7,336

성취 증명서

수천 개 기업의 학습자들이 사랑하는

팀을 교육하시나요?

비즈니스용으로 체험해 보세요

강의 설명

데이터에 부정확하거나 의심스러운 기록이 있는지 걱정되지만, 어디서부터 시작해야 할지 모르시나요? 이상치 탐지 알고리즘이 도움이 될 수 있어요! 이상치 탐지는 비정상적인 데이터 포인트를 식별하기 위한 기법들의 모음으로, 금융 사기 탐지나 컴퓨터 네트워크의 악의적 활동을 파악하는 데 매우 중요합니다. 이 강의에서는 이상치를 찾기 위한 통계적 검정을 살펴보고, local outlier factor와 isolation forest 같은 정교한 이상치 점수화 알고리즘을 배우게 됩니다. 또한 UCI Wine quality 데이터셋에서 비정상적인 와인을 식별하고, 호르몬 수치의 이상으로 갑상샘 질환 사례를 탐지하는 데 이상치 탐지 알고리즘을 직접 적용해 볼 거예요.

선수 조건

1

Statistical outlier detection

In this chapter, you'll learn how numerical and graphical summaries can be used to informally assess whether data contain unusual points. You'll use a statistical procedure called Grubbs' test to check whether a point is an outlier, and learn about the Seasonal-Hybrid ESD algorithm, which can help identify outliers when the data are a time series.

What do we mean when we talk about anomalies?

Recognizing anomaly types

Exploring the river nitrate data

Testing the extremes with Grubbs' test

Visual check of normality

Grubbs' test

Hunting multiple outliers using Grubbs' test

Anomalies in time series

Visual assessment of seasonality

Seasonal Hybrid ESD algorithm

Interpreting Seasonal-Hybrid ESD output

Seasonal-Hybrid ESD versus Grubbs' test

2

Distance and density based anomaly detection

In this chapter, you'll learn how to calculate the k-nearest neighbors distance and the local outlier factor, which are used to construct continuous anomaly scores for each data point when the data have multiple features. You'll learn the difference between local and global anomalies and how the two algorithms can help in each case.

k-nearest neighbors distance score

Exploring wine

kNN distance matrix

kNN distance score

Visualizing kNN distance

Standardizing features

Appending the kNN score

Visualizing kNN distance score

Local outlier factor

LOF calculation

LOF visualization

3

Isolation forest

k-nearest neighbors distance and local outlier factor use the distance or relative density of the nearest neighbors to score each point. In this chapter, you'll explore an alternative tree-based approach called an isolation forest, which is a fast and robust method of detecting anomalies that measures how easily points can be separated by randomly splitting the data into smaller and smaller regions.

Isolation trees

Fit and predict with an isolation tree

Score interpretation

Isolation forest

Fit an isolation forest

Checking convergence

Visualizing the isolation score

A grid of points

Prediction over a grid

Anomaly contours

4

Comparing performance

You've now been introduced to a few different algorithms for anomaly scoring. In this final chapter, you'll learn to compare the detection performance of the algorithms in instances where labeled anomalies are available. You'll learn to calculate and interpret the precision and recall statistics for an anomaly score, and how to adapt the algorithms so they can accommodate data with categorical features.

Labeled anomalies

Thyroid data

Visualizing thyroid disease

Anomaly score

Measuring performance

Binarized scores

Cross-tabulate binary scores

Thyroid precision and recall

Working with categorical features

Converting character to factor

Isolation forest with factors

LOF with factors

R로 시작하는 이상치 탐지 입문

강의
완료

수료증 획득

LinkedIn 프로필, 이력서 또는 CV에 이 인증서를 추가하세요
소셜 미디어와 성과 평가에서 공유하세요지금 등록

19백만 명 이상의 학습자와 함께 R로 시작하는 이상치 탐지 입문을(를) 시작하세요!

DataCamp for Mobile을 통해 데이터 분석 능력을 향상시키세요.

모바일 강좌와 매일 5분 코딩 챌린지를 통해 이동 중에도 학습 효과를 높이세요.