ข้ามไปยังเนื้อหาหลัก

หน้าหลัก Python

คอร์ส

Data Privacy and Anonymization in Python

ขั้นสูงระดับทักษะ

อัปเดตแล้ว 06/2565

Learn to process sensitive information with privacy-preserving techniques.

เริ่มคอร์สฟรี

PythonMachine Learning

4 ชม.

16 วิดีโอ

49 แบบฝึกหัด

3,850 XP

3,757

ใบรับรองความสำเร็จ

สร้างบัญชีฟรีของคุณ

ดำเนินการต่อด้วย Google แสดงตัวเลือกเพิ่มเติม

หรือ

เมื่อดำเนินการต่อ คุณยอมรับ ข้อกำหนดการใช้งาน ของเรา นโยบายความเป็นส่วนตัว ของเรา และยอมรับว่าข้อมูลของคุณจะถูกจัดเก็บในสหรัฐอเมริกา

เป็นที่รักของผู้เรียนในบริษัทหลายพันแห่ง

กำลังฝึกอบรมทีม?

ลองใช้สำหรับธุรกิจ

คำอธิบายคอร์ส

Data privacy has never been more important. But how do you balance privacy with the need to gather and share valuable business insights? In this course, you'll learn how to do just that, using the same methods as Google and Amazon—including data generalization and privacy models, like k-Anonymity and differential privacy. In addition to touching on topics such as GDPR, you'll also discover how to build and train machine learning models in Python while protecting users’ sensitive information such as employee and income data. Let’s get started!

ข้อกำหนดเบื้องต้น

Unsupervised Learning in Python

1

Introduction to Data Privacy

Get ready to apply anonymization techniques such as data suppression, masking, synthetic data generation, and generalization. In this chapter, you’ll learn how to distinguish between sensitive and non-sensitive personally identifiable information (PII), quasi-identifiers, and the basics of the GDPR. You'll also encounter real-life examples of what can go wrong if you don't follow these best practices.

What's private, and why do we care?

Privacy is power

Is it sensitive or non-sensitive?

Suppression of sensitive attributes

Data masking and data generation with Faker

Masking sensitive PII

Removing names with faker

Anonymizing with data generalization

Reducing identification risk with generalization

Data aggregation and data generalization

Top and bottom coding White House salaries

เริ่มบท

2

More on Privacy-Preserving Techniques

Discover how to anonymize data by sampling from datasets following the probability distribution of the columns. You’ll then learn how to apply the k-anonymity privacy model to prevent linkage or re-identification attacks and use hierarchies to perform data generalization in categorical variables.

Anonymizing categorical data

Explore the distribution of data

Sampling from the same probability distribution

Anonymizing continuous data

Different distributions

Sampling from the best continuous distribution

Introduction to K-anonymity

Privacy attributes

Generalizing into ranges

Generalizing data using hierarchies

Using hierarchies for categorical data

K-anonymizing a dataset

เริ่มบท

3

Differential Privacy

Learn about differential privacy, the model used by major technology companies such as Apple, Google, and Uber. In this chapter, you’ll explore data by generating private histograms and computing private averages in data. You’ll also create differentially private machine learning models that allow businesses to increase the utility of their data.

Introduction to differential privacy

Epsilon (ϵ): the magic number

Histograms with differential privacy

Privacy budgets

Using privacy budgets

When no budget is left

Exploring data with a privacy budget accountant

Differentially private machine learning models

Build a differentially private classifier

Predicting salaries

Differentially private clustering models

Pre-processing data

Segmenting customers

เริ่มบท

4

Anonymizing and Releasing Datasets

In this final chapter, you’ll learn how to apply dimensionality reduction methods such as principal component analysis (PCA) to anonymize large multi-column datasets. You’ll then use Faker to generate realistic and consistent datasets, and scikit-learn to create synthetic datasets that follow a normal distribution. Lastly, you’ll tie everything you learned in this course together as you combine multiple techniques to safely release datasets to the public.

PCA for anonymization

Anonymization of high-dimensional data

Data masking with PCA

Generating realistic datasets with Faker

Consistent synthetic dataset

Datasets with the same probabilistic distribution

Creating synthetic datasets using scikit-learn

Generating datasets for classification

Generating datasets for clustering

Safely release datasets to the public

Exploring and pseudonymizing a dataset

Preparing employee data for safe release

Great work!

เริ่มบท

Data Privacy and Anonymization in Python

คอร์สเสร็จสมบูรณ์

รับใบรับรองความสำเร็จ

เพิ่มใบรับรองนี้ไปยังโปรไฟล์ LinkedIn เรซูเม่ หรือ CV ของคุณ
แชร์บน social media และในการรีวิวผลการปฏิบัติงานของคุณลงทะเบียนทันที

สำหรับธุรกิจ

ฝึกอบรม 2 คนขึ้นไปหรือไม่?

ให้ทีมของคุณเข้าถึงแพลตฟอร์ม DataCamp เต็มรูปแบบ รวมถึงฟีเจอร์ทั้งหมด

ผู้สอน

Rebeca Gonzalez

Rebeca Gonzalez

Data Scientist, Hiberus Tecnologia

ผู้ร่วมงาน

คอร์ส แหล่งข้อมูล

IBM HR Analytics Employee Attrition & Performanceชุดข้อมูล

US Adult Incomeชุดข้อมูล

Mall Customersชุดข้อมูล

2017-2018 NBA Salariesชุดข้อมูล

ร่วมกับผู้เรียนกว่า 19 ล้านคนและเริ่มต้น Data Privacy and Anonymization in Python วันนี้!

สร้างบัญชีฟรีของคุณ

ดำเนินการต่อด้วย Google แสดงตัวเลือกเพิ่มเติม

หรือ

เมื่อดำเนินการต่อ คุณยอมรับ ข้อกำหนดการใช้งาน ของเรา นโยบายความเป็นส่วนตัว ของเรา และยอมรับว่าข้อมูลของคุณจะถูกจัดเก็บในสหรัฐอเมริกา

พัฒนาทักษะด้านข้อมูลของคุณด้วย DataCamp for Mobile

พัฒนาทักษะได้ทุกที่ทุกเวลาด้วยคอร์สเรียนบนมือถือและแบบฝึกหัดเขียนโค้ดประจำวัน 5 นาทีของเรา