课程

Handling Missing Data with Imputations in R

高级技能水平

更新时间 2022年10月

Diagnose, visualize and treat missing data with a range of imputation techniques with tips to improve your results.

免费开始课程

RData Manipulation

4小时

13 视频

49 道练习

4,200 XP

6,193

成就证明

深受数千家公司学习者的喜爱

需要团队培训？

企业版试用

课程描述

Missing data is everywhere. The process of filling in missing values is known as imputation, and knowing how to correctly fill in missing data is an essential skill if you want to produce accurate predictions and distinguish yourself from the crowd. In this course, you’ll learn how to use visualizations and statistical tests to recognize missing data patterns and how to impute data using a collection of statistical and machine learning models. You’ll also gain decision-making skills, helping you decide which imputation method fits best in a particular situation. Finally, you’ll learn to incorporate uncertainty from imputation into your inference and predictions, making them more robust and reliable.

先决条件

Intermediate Regression in R Dealing With Missing Data in R

1

The Problem of Missing Data

In this chapter, you’ll find out why missing data can be a risk when analyzing a dataset. You’ll be introduced to the three missing data mechanisms and learn how to recognize them using statistical tests and visualization tools.

Missing data: what can go wrong

Linear regression with incomplete data

Analyzing regression output

Comparing models

Missing data mechanisms

Recognizing missing data mechanisms

t-test for MAR: data preparation

t-test for MAR: interpretation

Visualizing missing data patterns

Aggregation plot

Mosaic plot

2

Donor-Based Imputation

Get to know the taxonomy of imputation methods and learn three donor-based techniques: mean, hot-deck, and k-Nearest-Neighbors imputation. You’ll look under the hood to see how these methods work, before learning how to apply them to a real-world tropical weather dataset. Along the way, you’ll also learn useful tricks that you can use to make them work even better for your problems.

Mean imputation

Smelling the danger of mean imputation

Mean-imputing the temperature

Assessing imputation quality with margin plot

Hot-deck imputation

Vanilla hot-deck

Hot-deck tricks & tips I: imputing within domains

Hot-deck tricks & tips II: sorting by correlated variables

k-Nearest-Neighbors imputation

Choosing the number of neighbors

kNN tricks & tips I: weighting donors

kNN tricks & tips II: sorting variables

3

Model-Based Imputation

It’s time to learn how to use statistical and machine learning models, such as linear regression, logistic regression, and random forests, to impute missing data. In this chapter, you’ll look into how the models make their predictions and use this knowledge to draw the imputed values from conditional distributions. This is important as it ensures your imputations are more varied and plausible, making them more similar to the true data.

Model-based imputation approach

Linear regression imputation

Initializing missing values & iterating over variables

Detecting convergence

Replicating data variability

Logistic regression imputation

Drawing from conditional distribution

Model-based imputation with multiple variable types

Tree-based imputation

Imputing with random forests

Variable-wise imputation errors

Speed-accuracy trade-off

4

Uncertainty from Imputation

Imputed values are not set in stone. They are just estimates and estimates come with some uncertainty. In this final chapter, you’ll discover how bootstrapping and chained equation using the mice package can be used to incorporate imputation uncertainty into your models and analyses to make them more reliable and robust.

Multiple imputation by bootstrapping

Wrapping imputation & modeling in a function

Running the bootstrap

Bootstrapping confidence intervals

Multiple imputation by chained equations

The mice flow: mice - with - pool

Choosing default models

Using predictor matrix

Putting it all together

Analyzing missing data patterns

Imputing and inspecting outcomes

Inference with imputed data

Final remarks

Handling Missing Data with Imputations in R

课程完成

获得成就证明

将此证书添加到您的 LinkedIn 档案、简历或履历中
在社交媒体和绩效评估中分享立即注册

加入超过19百万学习者，今天就开始Handling Missing Data with Imputations in R！

通过 DataCamp for Mobile 提升您的数据技能

随时随地通过我们的移动课程和每日 5 分钟编程挑战提升技能。