Stanley Mwangi has completed

Exploring and Analyzing Data in Python

4 hours

4,150 XP

Loved by learners at thousands of companies

Course Description

How do we get from data to answers? Exploratory data analysis is a process for exploring datasets, answering questions, and visualizing results. This course presents the tools you need to clean and validate data, to visualize distributions and relationships between variables, and to use regression models to predict and explain. You'll explore data related to demographics and health, including the National Survey of Family Growth and the General Social Survey. But the methods you learn apply to all areas of science, engineering, and business. You'll use Pandas, a powerful library for working with data, and other core Python libraries including NumPy and SciPy, StatsModels for regression, and Matplotlib for visualization. With these tools and skills, you will be prepared to work with real data, make discoveries, and present compelling results.

For Business

Training 2 or more people?

Get your team access to the full DataCamp library, with centralized reporting, assignments, projects and more

1
Read, clean, and validate
Free
The first step of almost any data project is to read the data, check for errors and special cases, and prepare data for analysis. This is exactly what you'll do in this chapter, while working with a dataset obtained from the National Survey of Family Growth.
Play Chapter Now
DataFrames and Series
50 xp
Read the codebook
50 xp
Exploring the NSFG data
100 xp
Clean and Validate
50 xp
Validate a variable
50 xp
Clean a variable
100 xp
Compute a variable
100 xp
Filter and visualize
50 xp
Make a histogram
100 xp
Compute birth weight
100 xp
Filter
100 xp
2
Distributions
In the first chapter, having cleaned and validated your data, you began exploring it by using histograms to visualize distributions. In this chapter, you'll learn how to represent distributions using Probability Mass Functions (PMFs) and Cumulative Distribution Functions (CDFs). You'll learn when to use each of them, and why, while working with a new dataset obtained from the General Social Survey.
Play Chapter Now
Probability mass functions
50 xp
Make a PMF
100 xp
Plot a PMF
100 xp
Cumulative distribution functions
50 xp
Make a CDF
100 xp
Compute IQR
100 xp
Plot a CDF
100 xp
Comparing distributions
50 xp
Distribution of education
50 xp
Extract education levels
100 xp
Plot income CDFs
100 xp
Modeling distributions
50 xp
Distribution of income
100 xp
Comparing CDFs
100 xp
Comparing PDFs
100 xp
3
Relationships
Up until this point, you've only looked at one variable at a time. In this chapter, you'll explore relationships between variables two at a time, using scatter plots and other visualizations to extract insights from a new dataset obtained from the Behavioral Risk Factor Surveillance Survey (BRFSS). You'll also learn how to quantify those relationships using correlation and simple regression.
Play Chapter Now
Exploring relationships
50 xp
PMF of age
100 xp
Scatter plot
100 xp
Jittering
100 xp
Visualizing relationships
50 xp
Height and weight
100 xp
Distribution of income
100 xp
Income and height
100 xp
Correlation
50 xp
Computing correlations
100 xp
Interpreting correlations
50 xp
Simple regression
50 xp
Income and vegetables
100 xp
Fit a line
100 xp
4
Multivariate Thinking
Explore multivariate relationships using multiple regression to describe non-linear relationships and logistic regression to explain and predict binary variables.
Play Chapter Now
Limits of simple regression
50 xp
Regression and causation
50 xp
Using StatsModels
100 xp
Multiple regression
50 xp
Plot income and education
100 xp
Non-linear model of education
100 xp
Visualizing regression results
50 xp
Making predictions
100 xp
Visualizing predictions
100 xp
Logistic regression
50 xp
Predicting a binary variable
100 xp
Next steps
50 xp

For Business

Training 2 or more people?

Get your team access to the full DataCamp library, with centralized reporting, assignments, projects and more

Datasets

National Survey of Family Growth (NSFG)General Social Survey (GSS)Behavioral Risk Factor Surveillance System (BRFSS)

Collaborators

Chester Ismay

Yashas Roy

Prerequisites

Python Data Science Toolbox (Part 2)

Allen Downey

Professor, Olin College

Join over 13 million learners and start Exploring and Analyzing Data in Python today!

Create Your Free Account

Google LinkedIn Facebook

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.

Exploring and Analyzing Data in Python

Loved by learners at thousands of companies

Course Description

.css-1goj2uy{margin-right:8px;}Group.css-gnv7tt{font-size:20px;font-weight:700;white-space:nowrap;}.css-12nwtlk{box-sizing:border-box;margin:0;min-width:0;color:#05192D;font-size:16px;line-height:1.5;font-size:20px;font-weight:700;white-space:nowrap;}Training 2 or more people?

Read, clean, and validate

Distributions

Relationships

Multivariate Thinking

GroupTraining 2 or more people?

Join over .css-ou6dz6{color:#03ef62;}13 million learners and start Exploring and Analyzing Data in Python today!

Create Your Free Account

Training 2 or more people?

Training 2 or more people?

Join over 13 million learners and start Exploring and Analyzing Data in Python today!