Working with Web Data in R
Learn how to efficiently import data from the web into R.
Start Course for Free4 Hours16 Videos56 Exercises17,744 Learners4500 XP
Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA. You confirm you are at least 16 years old (13 if you are an authorized Classrooms user).Loved by learners at thousands of companies
Course Description
Most of the useful data in the world, from economic data to news content to geographic information, lives somewhere on the internet - and this course will teach you how to access it. You'll explore how to work with APIs (computer-readable interfaces to websites), access data from Wikipedia and other sources, and build your own simple API client. For those occasions where APIs are not available, you'll find out how to use R to scrape information out of web pages. In the process you'll learn how to get data out of even the most stubborn website, and how to turn it into a format ready for further analysis. The packages you'll use and learn your way around are rvest, httr, xml2 and jsonlite, along with particular API client packages like WikipediR and pageviews.
- 1
Downloading Files and Using API Clients
FreeSometimes getting data off the internet is very, very simple - it's stored in a format that R can handle and just lives on a server somewhere, or it's in a more complex format and perhaps part of an API but there's an R package designed to make using it a piece of cake. This chapter will explore how to download and read in static files, and how to use APIs when pre-existing clients are available.
- 2
Using httr to interact with APIs directly
If an API client doesn't exist, it's up to you to communicate directly with the API. But don't worry, the package `httr` makes this really straightforward. In this chapter you'll learn how to make web requests from R, how to examine the responses you get back and some best practices for doing this in a responsible way.
GET and POST requests in theory50 xpGET requests in practice100 xpPOST requests in practice100 xpExtracting the response100 xpMultiple Choice: GET and POST requests50 xpGraceful httr50 xpHandling http failures100 xpConstructing queries (Part I)100 xpConstructing queries (Part II)100 xpRespectful API usage50 xpUsing user agents100 xpRate-limiting100 xpTying it all together100 xp - 3
Handling JSON and XML
Sometimes data is a TSV or nice plaintext output. Sometimes it's XML and/or JSON. This chapter walks you through what JSON and XML are, how to convert them into R-like objects, and how to extract data from them. You'll practice by examining the revision history for a Wikipedia article retrieved from the Wikipedia API using httr, xml2 and jsonlite.
JSON50 xpCan you spot JSON?50 xpParsing JSON100 xpManipulating JSON50 xpManipulating parsed JSON100 xpReformatting JSON100 xpXML structure50 xpDo you understand XML structure?50 xpExamining XML documents100 xpXPATHs50 xpExtracting XML data100 xpExtracting XML attributes100 xpWrapup: returning nice API output100 xp - 4
Web scraping with XPATHs
Now that we've covered the low-hanging fruit ("it has an API, and a client", "it has an API") it's time to talk about what to do when a website doesn't have any access mechanisms at all - when you have to rely on web scraping. This chapter will introduce you to the rvest web-scraping package, and build on your previous knowledge of XML manipulation and XPATHs.
- 5
CSS Web Scraping and Final Case Study
CSS path-based web scraping is a far-more-pleasant alternative to using XPATHs. You'll start this chapter by learning about CSS, and how to leverage it for web scraping. Then, you'll work through a final case study that combines everything you've learnt so far to write a function that queries an API, parses the response and returns data in a nice form.
Collaborators

Prerequisites
Intermediate R
Charlotte Wickham
Assistant Professor at Oregon State University
Charlotte is an Assistant Professor in the Department of Statistics at Oregon State University and an avid R programmer with a passion for teaching. Her interests lie in spatiotemporal data, statistical graphics and computing, and environmental statistics.

Oliver Keyes
Data Scientist
Oliver is a long-time data scientist and currently works as a Ph.D. student and instructor at the University of Washington. They are the developer of over 30 R packages, including many standard API clients.
What do other learners have to say?
I've used other sites—Coursera, Udacity, things like that—but DataCamp's been the one that I've stuck with.
Devon Edwards Joseph
Lloyds Banking Group
DataCamp is the top resource I recommend for learning data science.
Louis Maiden
Harvard Business School
DataCamp is by far my favorite website to learn from.
Ronald Bowers
Decision Science Analytics, USAA
Join over 9 million learners and start Working with Web Data in R today!
Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA. You confirm you are at least 16 years old (13 if you are an authorized Classrooms user).