Skip to main content

Working with Web Data in R

Learn how to efficiently import data from the web into R.

Start Course for Free
4 Hours16 Videos56 Exercises17,744 Learners4500 XP

Create Your Free Account



By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA. You confirm you are at least 16 years old (13 if you are an authorized Classrooms user).

Loved by learners at thousands of companies

Course Description

Most of the useful data in the world, from economic data to news content to geographic information, lives somewhere on the internet - and this course will teach you how to access it. You'll explore how to work with APIs (computer-readable interfaces to websites), access data from Wikipedia and other sources, and build your own simple API client. For those occasions where APIs are not available, you'll find out how to use R to scrape information out of web pages. In the process you'll learn how to get data out of even the most stubborn website, and how to turn it into a format ready for further analysis. The packages you'll use and learn your way around are rvest, httr, xml2 and jsonlite, along with particular API client packages like WikipediR and pageviews.

  1. 1

    Downloading Files and Using API Clients


    Sometimes getting data off the internet is very, very simple - it's stored in a format that R can handle and just lives on a server somewhere, or it's in a more complex format and perhaps part of an API but there's an R package designed to make using it a piece of cake. This chapter will explore how to download and read in static files, and how to use APIs when pre-existing clients are available.

    Play Chapter Now
    Introduction: Working With Web Data in R
    50 xp
    Downloading files and reading them into R
    100 xp
    Saving raw files to disk
    100 xp
    Saving formatted files to disk
    100 xp
    Understanding Application Programming Interfaces
    50 xp
    API test
    50 xp
    Using API clients
    100 xp
    Access tokens and APIs
    50 xp
    Using access tokens
    100 xp
  2. 2

    Using httr to interact with APIs directly

    If an API client doesn't exist, it's up to you to communicate directly with the API. But don't worry, the package `httr` makes this really straightforward. In this chapter you'll learn how to make web requests from R, how to examine the responses you get back and some best practices for doing this in a responsible way.

    Play Chapter Now
  3. 3

    Handling JSON and XML

    Sometimes data is a TSV or nice plaintext output. Sometimes it's XML and/or JSON. This chapter walks you through what JSON and XML are, how to convert them into R-like objects, and how to extract data from them. You'll practice by examining the revision history for a Wikipedia article retrieved from the Wikipedia API using httr, xml2 and jsonlite.

    Play Chapter Now
  4. 4

    Web scraping with XPATHs

    Now that we've covered the low-hanging fruit ("it has an API, and a client", "it has an API") it's time to talk about what to do when a website doesn't have any access mechanisms at all - when you have to rely on web scraping. This chapter will introduce you to the rvest web-scraping package, and build on your previous knowledge of XML manipulation and XPATHs.

    Play Chapter Now
  5. 5

    CSS Web Scraping and Final Case Study

    CSS path-based web scraping is a far-more-pleasant alternative to using XPATHs. You'll start this chapter by learning about CSS, and how to leverage it for web scraping. Then, you'll work through a final case study that combines everything you've learnt so far to write a function that queries an API, parses the response and returns data in a nice form.

    Play Chapter Now


richieRichie Cotton


Intermediate R
Charlotte Wickham Headshot

Charlotte Wickham

Assistant Professor at Oregon State University

Charlotte is an Assistant Professor in the Department of Statistics at Oregon State University and an avid R programmer with a passion for teaching. Her interests lie in spatiotemporal data, statistical graphics and computing, and environmental statistics.
See More
Oliver Keyes Headshot

Oliver Keyes

Data Scientist

Oliver is a long-time data scientist and currently works as a Ph.D. student and instructor at the University of Washington. They are the developer of over 30 R packages, including many standard API clients.
See More

What do other learners have to say?

I've used other sites—Coursera, Udacity, things like that—but DataCamp's been the one that I've stuck with.

Devon Edwards Joseph
Lloyds Banking Group

DataCamp is the top resource I recommend for learning data science.

Louis Maiden
Harvard Business School

DataCamp is by far my favorite website to learn from.

Ronald Bowers
Decision Science Analytics, USAA

Join over 9 million learners and start Working with Web Data in R today!

Create Your Free Account



By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA. You confirm you are at least 16 years old (13 if you are an authorized Classrooms user).