본문으로 바로가기

강의

Python으로 배우는 데이터 프라이버시와 익명화

고급기술 수준

업데이트됨 2022. 6.

프라이버시 보존 기법으로 민감 정보를 안전하게 처리하는 방법을 학습하세요.

무료로 강의 시작

PythonMachine Learning

4시간

16 동영상

49 연습 문제

3,850 XP

3,757

성취 증명서

수천 개 기업의 학습자들이 사랑하는

팀을 교육하시나요?

비즈니스용으로 체험해 보세요

강의 설명

데이터 프라이버시는 그 어느 때보다 중요합니다. 그렇다면 귀중한 비즈니스 인사이트를 수집하고 공유하는 필요성과 프라이버시를 어떻게 균형 있게 맞출 수 있을까요? 이 강의에서는 Google과 Amazon이 사용하는 것과 같은 방법—데이터 일반화와 k-Anonymity, 차등 프라이버시 같은 프라이버시 모델—을 활용해 그 해법을 배웁니다. 또한 GDPR 같은 주제를 짚어 보면서, 직원 정보나 소득 데이터처럼 민감한 사용자 정보를 보호한 채로 Python에서 Machine Learning 모델을 구축하고 학습하는 방법도 알아봅니다. 지금 시작해 볼까요!

선수 조건

Unsupervised Learning in Python

1

Introduction to Data Privacy

Get ready to apply anonymization techniques such as data suppression, masking, synthetic data generation, and generalization. In this chapter, you’ll learn how to distinguish between sensitive and non-sensitive personally identifiable information (PII), quasi-identifiers, and the basics of the GDPR. You'll also encounter real-life examples of what can go wrong if you don't follow these best practices.

What's private, and why do we care?

Privacy is power

Is it sensitive or non-sensitive?

Suppression of sensitive attributes

Data masking and data generation with Faker

Masking sensitive PII

Removing names with faker

Anonymizing with data generalization

Reducing identification risk with generalization

Data aggregation and data generalization

Top and bottom coding White House salaries

2

More on Privacy-Preserving Techniques

Discover how to anonymize data by sampling from datasets following the probability distribution of the columns. You’ll then learn how to apply the k-anonymity privacy model to prevent linkage or re-identification attacks and use hierarchies to perform data generalization in categorical variables.

Anonymizing categorical data

Explore the distribution of data

Sampling from the same probability distribution

Anonymizing continuous data

Different distributions

Sampling from the best continuous distribution

Introduction to K-anonymity

Privacy attributes

Generalizing into ranges

Generalizing data using hierarchies

Using hierarchies for categorical data

K-anonymizing a dataset

3

Differential Privacy

Learn about differential privacy, the model used by major technology companies such as Apple, Google, and Uber. In this chapter, you’ll explore data by generating private histograms and computing private averages in data. You’ll also create differentially private machine learning models that allow businesses to increase the utility of their data.

Introduction to differential privacy

Epsilon (ϵ): the magic number

Histograms with differential privacy

Privacy budgets

Using privacy budgets

When no budget is left

Exploring data with a privacy budget accountant

Differentially private machine learning models

Build a differentially private classifier

Predicting salaries

Differentially private clustering models

Pre-processing data

Segmenting customers

4

Anonymizing and Releasing Datasets

In this final chapter, you’ll learn how to apply dimensionality reduction methods such as principal component analysis (PCA) to anonymize large multi-column datasets. You’ll then use Faker to generate realistic and consistent datasets, and scikit-learn to create synthetic datasets that follow a normal distribution. Lastly, you’ll tie everything you learned in this course together as you combine multiple techniques to safely release datasets to the public.

PCA for anonymization

Anonymization of high-dimensional data

Data masking with PCA

Generating realistic datasets with Faker

Consistent synthetic dataset

Datasets with the same probabilistic distribution

Creating synthetic datasets using scikit-learn

Generating datasets for classification

Generating datasets for clustering

Safely release datasets to the public

Exploring and pseudonymizing a dataset

Preparing employee data for safe release

Great work!

Python으로 배우는 데이터 프라이버시와 익명화

강의
완료

수료증 획득

LinkedIn 프로필, 이력서 또는 CV에 이 인증서를 추가하세요
소셜 미디어와 성과 평가에서 공유하세요지금 등록

19백만 명 이상의 학습자와 함께 Python으로 배우는 데이터 프라이버시와 익명화을(를) 시작하세요!

DataCamp for Mobile을 통해 데이터 분석 능력을 향상시키세요.

모바일 강좌와 매일 5분 코딩 챌린지를 통해 이동 중에도 학습 효과를 높이세요.