본문으로 바로가기

강의

Python으로 배우는 이상치 탐지

중급기술 수준

업데이트됨 2025. 11.

이 4시간 코스에서 데이터 분석의 이상치를 탐지하고 Python 통계 도구를 확장하세요.

무료로 강의 시작

PythonProbability & Statistics

4시간

16 동영상

59 연습 문제

4,950 XP

7,191

성취 증명서

수천 개 기업의 학습자들이 사랑하는

팀을 교육하시나요?

비즈니스용으로 체험해 보세요

강의 설명

데이터 분석에서 이상 징후를 찾아내세요

극단값이나 이상치는 거의 모든 데이터세트에 존재하며, 통계적 탐색을 계속하기 전에 이를 탐지하고 처리하는 것이 매우 중요합니다. 손대지 않으면 이상치는 분석을 쉽게 방해하고 머신 러닝 모델의 성능을 왜곡할 수 있습니다.

Isolation Forest와 Local Outlier Factor 같은 추정기를 사용하는 방법을 배우세요

이 강의에서는 Python을 활용해 다양한 이상 탐지 방법을 구현하게 됩니다. 시각적으로 극단값을 찾아내고, 단변량 데이터세트에 대해 중앙값 절대 편차와 같은 검증된 통계 기법을 사용하게 됩니다. 다변량 데이터의 경우, Isolation Forest, k-Nearest-Neighbors, Local Outlier Factor와 같은 추정기를 사용하는 방법을 배우게 됩니다. 또한 여러 이상치 분류기를 앙상블하여 저위험 최종 추정기를 만드는 방법도 배우게 됩니다. Python을 활용한 이상 탐지라는 필수 데이터 과학 도구를 손에 넣게 됩니다.

Python 통계 도구를 확장하세요

이상 탐지가 더 좋아지면 데이터에 대한 이해도 더 좋아지고, 특히 시스템 동작에 대한 근본 원인 분석과 커뮤니케이션도 더 향상됩니다. 기존 Python 역량에 이 기술을 더하면 데이터 정제, 사기 탐지, 시스템 이상 징후 식별에 도움이 됩니다.

선수 조건

Supervised Learning with scikit-learn

1

Detecting Univariate Outliers

This chapter covers techniques to detect outliers in 1-dimensional data using histograms, scatterplots, box plots, z-scores, and modified z-scores.

What are anomalies and outliers?

Print a 5-number summary

Histograms for outlier detection

Scatterplots for outlier detection

Box plots and IQR

Boxplots for outlier detection

Calculating outlier limits with IQR

Using outlier limits for filtering

Using z-scores for Anomaly Detection

Finding outliers with z-scores

Using modified z-scores with PyOD

2

Isolation Forests with PyOD

In this chapter, you’ll learn the ins and outs of how the Isolation Forest algorithm works. Explore how Isolation Trees are built, the essential parameters of PyOD's IForest and how to tune them, and how to interpret the output of IForest using outlier probability scores.

Getting started with Isolation Forests

The difference between univariate and multivariate anomalies

Detecting outliers with IForest

Overview of Isolation Forest hyperparameters

Most important IForest parameters

Choosing contamination

Choosing n_estimators

Checking the theory

Hyperparameter tuning of Isolation Forest

Tuning contamination

Tuning multiple hyperparameters

Interpreting the output of IForest

Alternative way of classifying with IForest

Using outlier probabilities

3

Distance and Density-based Algorithms

After a tree-based outlier classifier, you will explore a class of distance and density-based detectors. KNN and Local Outlier Factor classifiers have been proven highly effective in this area, and you will learn how to use them.

KNN for outlier detection

KNN for the first time

KNN with outlier probabilities

Outlier-robust feature scaling

Finding the euclidean distance manually

Finding the euclidean distance with SciPy

Practicing standardization

Testing QuantileTransformer

Hyperparameters of KNN

Differentiating distance metrics

Calculating manhattan distance manually

Tuning n_neighbors

Tuning the aggregation method

Local Outlier Factor

LOF for the first time

LOF with outlier probabilities

4

Time Series Anomaly Detection and Outlier Ensembles

In this chapter, you’ll learn how to perform anomaly detection on time series datasets and make your predictions more stable and trustworthy using outlier ensembles.

Introduction to time series

Working with DateTime columns

Creating a DateTimeIndex

MAD on time series

Isolation Forest on time series

Time Series Decomposition for Outlier Detection

Practicing decomposition

Fitting on residuals

Outlier classifier ensembles

Scaling parts of a dataset

Manual outlier ensembles - creating the arrays

Storing outlier probabilities

Aggregating and thresholding the probabilities

How to deal with identified outliers

Classifying the reasons for outlier presence

When to drop outliers

Non-aggressive methods of dealing with outliers

Congratulations!

Python으로 배우는 이상치 탐지

강의
완료

수료증 획득

LinkedIn 프로필, 이력서 또는 CV에 이 인증서를 추가하세요
소셜 미디어와 성과 평가에서 공유하세요지금 등록

19백만 명 이상의 학습자와 함께 Python으로 배우는 이상치 탐지을(를) 시작하세요!

DataCamp for Mobile을 통해 데이터 분석 능력을 향상시키세요.

모바일 강좌와 매일 5분 코딩 챌린지를 통해 이동 중에도 학습 효과를 높이세요.