# Exploratory Data Analysis in R

Learn how to use graphical and numerical techniques to begin uncovering the structure of your data.

4 Hours15 Videos54 Exercises76,643 Learners3950 XPData Analyst TrackData Scientist Track

or

## Course Description

When your dataset is represented as a table or a database, it's difficult to observe much about it beyond its size and the types of variables it contains. In this course, you'll learn how to use graphical and numerical techniques to begin uncovering the structure of your data. Which variables suggest interesting relationships? Which observations are unusual? By the end of the course, you'll be able to answer these questions and more, while generating graphics that are both insightful and beautiful.

1. 1

### Exploring Categorical Data

Free

In this chapter, you will learn how to create graphical and numerical summaries of two categorical variables.

Exploring categorical data
50 xp
Bar chart expectations
50 xp
Contingency table review
100 xp
Dropping levels
100 xp
Side-by-side bar charts
100 xp
Bar chart interpretation
50 xp
Counts vs. proportions
50 xp
Conditional proportions
50 xp
Counts vs. proportions (2)
100 xp
Distribution of one variable
50 xp
Marginal bar chart
100 xp
Conditional bar chart
100 xp
Improve pie chart
100 xp
2. 2

### Exploring Numerical Data

In this chapter, you will learn how to graphically summarize numerical data.

3. 3

### Numerical Summaries

Now that we've looked at exploring categorical and numerical data, you'll learn some useful statistics for describing distributions of data.

4. 4

### Case Study

Apply what you've learned to explore and summarize a real world dataset in this case study of email spam.

In the following tracks

Data Analyst Data Scientist

Collaborators

#### Andrew Bray

Assistant Professor of Statistics at Reed College

Andrew Bray is an assistant professor of statistics at Reed College. His interests are in computing, differential privacy, environmental statistics, and statistics education. He is a co-author of the infer package for tidy statistical inference.