Interactive Course

Dealing With Missing Data in R

Make it easy to visualise, explore, and impute missing data with naniar, a tidyverse friendly approach to missing data.

  • 4 hours
  • 14 Videos
  • 52 Exercises
  • 3,158 Participants
  • 4,350 XP

Loved by learners at thousands of top companies:

mls-grey.svg
axa-grey.svg
forrester-grey.svg
lego-grey.svg
uber-grey.svg
dell-grey.svg

Course Description

Missing data is part of any real world data analysis. It can crop up in unexpected places, making analyses challenging to understand. In this course, you will learn how to use tidyverse tools and the naniar R package to visualize missing values. You'll tidy missing values so they can be used in analysis and explore missing values to find bias in the data. Lastly, you'll reveal other underlying patterns of missingness. You will also learn how to "fill in the blanks" of missing values with imputation models, and how to visualize, assess, and make decisions based on these imputed datasets.

  1. 1

    Why care about missing data?

    Free

    Chapter 1 introduces you to missing data, explaining what missing values are, their behavior in R, how to detect them, and how to count them. We then introduce missing data summaries and how to summarise missingness across cases, variables, and how to explore across groups within the data. Finally, we discuss missing data visualizations, how to produce overview visualizations for the entire dataset and over variables, cases, and other summaries, and how to explore these across groups.

  2. Testing missing relationships

    In this chapter, you will learn about workflows for working with missing data. We introduce special data structures, the shadow matrix, and nabular data, and demonstrate how to use them in workflows for exploring missing data so that you can link summaries of missingness back to values in the data. You will learn how to use ggplot to explore and visualize how values changes as other variables go missing. Finally, you learn how to visualize missingness across two variables, and how and why to visualize missings in a scatterplot.

  3. Wrangling and tidying up missing values

    In chapter two, you will learn how to uncover hidden missing values like "missing" or "N/A" and replace them with `NA`. You will learn how to efficiently handle implicit missing values - those values implied to be missing, but not explicitly listed. We also cover how to explore missing data dependence, discussing Missing Completely at Random (MCAR), Missing At Random (MAR), Missing Not At Random (MNAR), and what they mean for your data analysis.

  4. Connecting the dots (Imputation)

    In this chapter, you will learn about filling in the missing values in your data, which is called imputation. You will learn how to impute and track missing values, and what the good and bad features of imputations are so that you can explore, visualise, and evaluate the imputed data against the original values. You will learn how to use, evaluate, and compare different imputation models, and explore how different imputation models affect the inferences you can draw from the models.

  1. 1

    Why care about missing data?

    Free

    Chapter 1 introduces you to missing data, explaining what missing values are, their behavior in R, how to detect them, and how to count them. We then introduce missing data summaries and how to summarise missingness across cases, variables, and how to explore across groups within the data. Finally, we discuss missing data visualizations, how to produce overview visualizations for the entire dataset and over variables, cases, and other summaries, and how to explore these across groups.

  2. Wrangling and tidying up missing values

    In chapter two, you will learn how to uncover hidden missing values like "missing" or "N/A" and replace them with `NA`. You will learn how to efficiently handle implicit missing values - those values implied to be missing, but not explicitly listed. We also cover how to explore missing data dependence, discussing Missing Completely at Random (MCAR), Missing At Random (MAR), Missing Not At Random (MNAR), and what they mean for your data analysis.

  3. Testing missing relationships

    In this chapter, you will learn about workflows for working with missing data. We introduce special data structures, the shadow matrix, and nabular data, and demonstrate how to use them in workflows for exploring missing data so that you can link summaries of missingness back to values in the data. You will learn how to use ggplot to explore and visualize how values changes as other variables go missing. Finally, you learn how to visualize missingness across two variables, and how and why to visualize missings in a scatterplot.

  4. Connecting the dots (Imputation)

    In this chapter, you will learn about filling in the missing values in your data, which is called imputation. You will learn how to impute and track missing values, and what the good and bad features of imputations are so that you can explore, visualise, and evaluate the imputed data against the original values. You will learn how to use, evaluate, and compare different imputation models, and explore how different imputation models affect the inferences you can draw from the models.

What do other learners have to say?

Devon

“I've used other sites, but DataCamp's been the one that I've stuck with.”

Devon Edwards Joseph

Lloyd's Banking Group

Louis

“DataCamp is the top resource I recommend for learning data science.”

Louis Maiden

Harvard Business School

Ronbowers

“DataCamp is by far my favorite website to learn from.”

Ronald Bowers

Decision Science Analytics @ USAA

Nicholas Tierney
Nicholas Tierney

Statistician

I recently completed my PhD in Statistics at QUT, and am now a Research Fellow in Statistics at Monash University working with Rob Hyndman and Di Cook in the NUMBAT group. My research aims to improve data analysis workflow. This includes statistical modeling, calculating diagnostics, drawing inferences and making decisions. Crucial to this work is producing high quality software to accompany each research idea. My work so far has focussed on the importance of knowing your data (visdat), and on creating principles and tools that make it easier to work with, explore, and model missing data (naniar). I have also implemented theoretical optimization models to identify and relocate facilities to maximize their coverage on a population, in the R package maxcovr, and am interested in testing if commonly used diagnostics for MCMC methods are used effectively by researchers. I love the R programming language and how it has transformed my world.

See More
Collaborators
  • David Campos

    David Campos

  • Shon Inouye

    Shon Inouye

  • Chester Ismay

    Chester Ismay

  • Sascha Mayr

    Sascha Mayr

Icon Icon Icon professional info