Loved by learners at thousands of companies
Course Description
When your dataset is represented as a table or a database, it's difficult to observe much about it beyond its size and the types of variables it contains. In this course, you'll learn how to use graphical and numerical techniques to begin uncovering the structure of your data. Which variables suggest interesting relationships? Which observations are unusual? By the end of the course, you'll be able to answer these questions and more, while generating graphics that are both insightful and beautiful.
For Business
Training 2 or more people?
Get your team access to the full DataCamp library, with centralized reporting, assignments, projects and more- 1
Exploring Categorical Data
FreeIn this chapter, you will learn how to create graphical and numerical summaries of two categorical variables.
Exploring categorical data50 xpBar chart expectations50 xpContingency table review100 xpDropping levels100 xpSide-by-side bar charts100 xpBar chart interpretation50 xpCounts vs. proportions50 xpConditional proportions50 xpCounts vs. proportions (2)100 xpDistribution of one variable50 xpMarginal bar chart100 xpConditional bar chart100 xpImprove pie chart100 xp - 2
Exploring Numerical Data
In this chapter, you will learn how to graphically summarize numerical data.
Exploring numerical data50 xpFaceted histogram100 xpBoxplots and density plots100 xpCompare distribution via plots50 xpDistribution of one variable50 xpMarginal and conditional histograms100 xpMarginal and conditional histograms interpretation50 xpThree binwidths100 xpThree binwidths interpretation50 xpBox plots50 xpBox plots for outliers100 xpPlot selection100 xpVisualization in higher dimensions50 xp3 variable plot100 xpInterpret 3 var plot50 xp - 3
Numerical Summaries
Now that we've looked at exploring categorical and numerical data, you'll learn some useful statistics for describing distributions of data.
Measures of center50 xpChoice of center measure50 xpCalculate center measures100 xpMeasures of variability50 xpChoice of spread measure50 xpCalculate spread measures100 xpChoose measures for center and spread100 xpShape and transformations50 xpDescribe the shape50 xpTransformations100 xpOutliers50 xpIdentify outliers100 xp - 4
Case Study
Apply what you've learned to explore and summarize a real world dataset in this case study of email spam.
Introducing the data50 xpSpam and num_char100 xpSpam and num_char interpretation50 xpSpam and !!!100 xpSpam and !!! interpretation50 xpCheck-in 150 xpCollapsing levels100 xpImage and spam interpretation50 xpData Integrity100 xpAnswering questions with chains100 xpCheck-in 250 xpWhat's in a number?100 xpWhat's in a number interpretation50 xpConclusion50 xp
For Business
Training 2 or more people?
Get your team access to the full DataCamp library, with centralized reporting, assignments, projects and morecollaborators
Andrew Bray
See MoreAssistant Professor of Statistics at Reed College
Andrew Bray is an assistant professor of statistics at Reed College. His interests are in computing, differential privacy, environmental statistics, and statistics education. He is a co-author of the infer package for tidy statistical inference.
Join over 14 million learners and start Exploratory Data Analysis in R today!
Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.