Daniel Chen
Daniel Chen

Daniel is a Software Carpentry instructor and a doctoral student in Genetics, Bioinformatics, and Computational Biology at Virginia Tech, where he works in the Social and Decision Analytics Laboratory under the Biocomplexity Institute. He received his MPH at the Mailman School of Public Health in Epidemiology and is interested in integrating hospital data in order to perform predictive health analytics and build clinical support tools for clinicians. An advocate of open science, he aspires to bridge data science with epidemiology and health care.


Hugo Bowne-Anderson Hugo Bowne-Anderson

Course Description

A vital component of data science involves acquiring raw data and getting it into a form ready for analysis. In fact, it is commonly said that data scientists spend 80% of their time cleaning and manipulating data, and only 20% of their time actually analyzing it. This course will equip you with all the skills you need to clean your data in Python, from learning how to diagnose your data for problems to dealing with missing values and outliers. At the end of the course, you'll apply all of the techniques you've learned to a case study in which you'll clean a real-world Gapminder dataset!

1Exploring your data

2Tidying data for analysis

3Combining data for analysis

4Cleaning data for analysis

5Case study