paid course

Data Privacy and Anonymization in R

Publicly release data sets with a differential privacy guarantee.

  • 4 hours
  • 13 Videos
  • 45 Exercises
  • 1,271 Participants
  • 3,650 XP

Course Description

With social media and big data everywhere, data privacy has been a growing, public concern. Recognizing this issue, entities such as Google, Apple, and the US Census Bureau are promoting better privacy techniques; specifically differential privacy, a mathematical condition that quantifies privacy risk. In this course, you will learn to code basic data privacy methods and a differentially private algorithm based on various differentially private properties. With these tools in hand, you will learn how to generate a basic synthetic (fake) data set with the differential privacy guarantee for public data release.

Publicly release data sets with a differential privacy guarantee.

Course Outline

  1. 1

    Introduction to Data Privacy

    Free

    This chapter covers some basic data privacy techniques that statisticians use to anonymize data. You'll first learn how to remove identifiers and then generate synthetic data from probability distributions.

  2. Introduction to Differential Privacy

    After covering the basic data privacy techniques, you'll learn conceptually about differential privacy as well as how to implement the most popular and common differentially private algorithm called the Laplace mechanism.

  3. Differentially Private Properties

    In this chapter, you will learn the various properties of differential privacy, such as the combination rules and post-processing, to properly implement the Laplace mechanism for various kinds data questions.

  4. Differentially Private Data Synthesis

    In this chapter, you will learn how to release simple data sets publicly using differentially private data synthesis techniques.

Claire Bowen
Claire Bowen

Postdoctoral Researcher at the Los Alamos National Laboratory

Claire McKay Bowen is a Postdoctoral Researcher in the Statistical Science Group at the Los Alamos National Laboratory. She conducts research in uncertainty quantification with physics-informed Bayesian Model updating and data privacy via differentially private data synthesis methods. Her other interests include statistical computing, scientific communication, and STEM outreach.

See More
Collaborators
  • Chester Ismay

    Chester Ismay

  • Sumedh Panchadhar

    Sumedh Panchadhar

Datasets

Course Instructor

Claire Bowen
Claire Bowen

Postdoctoral Researcher at the Los Alamos National Laboratory

Claire McKay Bowen is a Postdoctoral Researcher in the Statistical Science Group at the Los Alamos National Laboratory. She conducts research in uncertainty quantification with physics-informed Bayesian Model updating and data privacy via differentially private data synthesis methods. Her other interests include statistical computing, scientific communication, and STEM outreach.

See More
Collaborator(s)
  • Chester Ismay

    Chester Ismay

  • Sumedh Panchadhar

    Sumedh Panchadhar

Join over 3,220,000 others learning to leverage the power of data with DataCamp!

Start Course For Free
Icon Icon Icon professional Icon info