Spot Anomalies in Your Data Analysis
Extreme values or anomalies are present in almost any dataset, and it is critical to detect and deal with them before continuing statistical exploration. When left untouched, anomalies can easily disrupt your analyses and skew the performance of machine learning models.
Learn to Use Estimators Like Isolation Forest and Local Outlier Factor
In this course, you'll leverage Python to implement a variety of anomaly detection methods. You'll spot extreme values visually and use tested statistical techniques like Median Absolute Deviation for univariate datasets. For multivariate data, you'll learn to use estimators such as Isolation Forest, k-Nearest-Neighbors, and Local Outlier Factor. You'll also learn how to ensemble multiple outlier classifiers into a low-risk final estimator. You'll walk away with an essential data science tool in your belt: anomaly detection with Python.
Expand Your Python Statistical Toolkit
Better anomaly detection means better understanding of your data, and particularly, better root cause analysis and communication around system behavior. Adding this skill to your existing Python repertoire will help you with data cleaning, fraud detection, and identifying system disturbances.
Detecting Univariate OutliersFree
This chapter covers techniques to detect outliers in 1-dimensional data using histograms, scatterplots, box plots, z-scores, and modified z-scores.What are anomalies and outliers?50 xpPrint a 5-number summary100 xpHistograms for outlier detection100 xpScatterplots for outlier detection100 xpBox plots and IQR50 xpBoxplots for outlier detection100 xpCalculating outlier limits with IQR100 xpUsing outlier limits for filtering100 xpUsing z-scores for Anomaly Detection50 xpFinding outliers with z-scores100 xpUsing modified z-scores with PyOD100 xp
Isolation Forests with PyOD
In this chapter, you’ll learn the ins and outs of how the Isolation Forest algorithm works. Explore how Isolation Trees are built, the essential parameters of PyOD's IForest and how to tune them, and how to interpret the output of IForest using outlier probability scores.Getting started with Isolation Forests50 xpThe difference between univariate and multivariate anomalies50 xpDetecting outliers with IForest100 xpOverview of Isolation Forest hyperparameters50 xpMost important IForest parameters50 xpChoosing contamination100 xpChoosing n_estimators100 xpChecking the theory50 xpHyperparameter tuning of Isolation Forest50 xpTuning contamination100 xpTuning multiple hyperparameters100 xpInterpreting the output of IForest50 xpAlternative way of classifying with IForest100 xpUsing outlier probabilities100 xp
Distance and Density-based Algorithms
After a tree-based outlier classifier, you will explore a class of distance and density-based detectors. KNN and Local Outlier Factor classifiers have been proven highly effective in this area, and you will learn how to use them.KNN for outlier detection50 xpKNN for the first time100 xpKNN with outlier probabilities100 xpOutlier-robust feature scaling50 xpFinding the euclidean distance manually100 xpFinding the euclidean distance with SciPy100 xpPracticing standardization100 xpTesting QuantileTransformer100 xpHyperparameters of KNN50 xpDifferentiating distance metrics100 xpCalculating manhattan distance manually100 xpTuning n_neighbors100 xpTuning the aggregation method100 xpLocal Outlier Factor50 xpLOF for the first time100 xpLOF with outlier probabilities100 xp
Time Series Anomaly Detection and Outlier Ensembles
In this chapter, you’ll learn how to perform anomaly detection on time series datasets and make your predictions more stable and trustworthy using outlier ensembles.Introduction to time series50 xpWorking with DateTime columns100 xpCreating a DateTimeIndex100 xpMAD on time series100 xpIsolation Forest on time series100 xpTime Series Decomposition for Outlier Detection50 xpPracticing decomposition100 xpFitting on residuals100 xpOutlier classifier ensembles50 xpScaling parts of a dataset100 xpManual outlier ensembles - creating the arrays100 xpStoring outlier probabilities100 xpAggregating and thresholding the probabilities100 xpHow to deal with identified outliers50 xpClassifying the reasons for outlier presence100 xpWhen to drop outliers100 xpNon-aggressive methods of dealing with outliers100 xpCongratulations!50 xp
PrerequisitesSupervised Learning with scikit-learn