Skip to main content
Paul Love avatar

Paul Love has completed

Working with Web Data in R

Start course For Free
4 hours
4,500 XP
Statement of Accomplishment Badge

Loved by learners at thousands of companies


Course Description

Most of the useful data in the world, from economic data to news content to geographic information, lives somewhere on the internet - and this course will teach you how to access it. You'll explore how to work with APIs (computer-readable interfaces to websites), access data from Wikipedia and other sources, and build your own simple API client. For those occasions where APIs are not available, you'll find out how to use R to scrape information out of web pages. In the process you'll learn how to get data out of even the most stubborn website, and how to turn it into a format ready for further analysis. The packages you'll use and learn your way around are rvest, httr, xml2 and jsonlite, along with particular API client packages like WikipediR and pageviews.
For Business

GroupTraining 2 or more people?

Get your team access to the full DataCamp library, with centralized reporting, assignments, projects and more
Try DataCamp for BusinessFor a bespoke solution book a demo.
  1. 1

    Downloading Files and Using API Clients

    Free

    Sometimes getting data off the internet is very, very simple - it's stored in a format that R can handle and just lives on a server somewhere, or it's in a more complex format and perhaps part of an API but there's an R package designed to make using it a piece of cake. This chapter will explore how to download and read in static files, and how to use APIs when pre-existing clients are available.

    Play Chapter Now
    Introduction: Working With Web Data in R
    50 xp
    Downloading files and reading them into R
    100 xp
    Saving raw files to disk
    100 xp
    Saving formatted files to disk
    100 xp
    Understanding Application Programming Interfaces
    50 xp
    API test
    50 xp
    Using API clients
    100 xp
    Access tokens and APIs
    50 xp
    Using access tokens
    100 xp
  2. 2

    Using httr to interact with APIs directly

    If an API client doesn't exist, it's up to you to communicate directly with the API. But don't worry, the package `httr` makes this really straightforward. In this chapter you'll learn how to make web requests from R, how to examine the responses you get back and some best practices for doing this in a responsible way.

    Play Chapter Now
  3. 3

    Handling JSON and XML

    Sometimes data is a TSV or nice plaintext output. Sometimes it's XML and/or JSON. This chapter walks you through what JSON and XML are, how to convert them into R-like objects, and how to extract data from them. You'll practice by examining the revision history for a Wikipedia article retrieved from the Wikipedia API using httr, xml2 and jsonlite.

    Play Chapter Now
  4. 4

    Web scraping with XPATHs

    Now that we've covered the low-hanging fruit ("it has an API, and a client", "it has an API") it's time to talk about what to do when a website doesn't have any access mechanisms at all - when you have to rely on web scraping. This chapter will introduce you to the rvest web-scraping package, and build on your previous knowledge of XML manipulation and XPATHs.

    Play Chapter Now
  5. 5

    CSS Web Scraping and Final Case Study

    CSS path-based web scraping is a far-more-pleasant alternative to using XPATHs. You'll start this chapter by learning about CSS, and how to leverage it for web scraping. Then, you'll work through a final case study that combines everything you've learnt so far to write a function that queries an API, parses the response and returns data in a nice form.

    Play Chapter Now

Collaborators

Collaborator's avatar
Richie Cotton

Prerequisites

Intermediate R
Charlotte Wickham HeadshotCharlotte Wickham

Assistant Professor at Oregon State University

Charlotte is an Assistant Professor in the Department of Statistics at Oregon State University and an avid R programmer with a passion for teaching. Her interests lie in spatiotemporal data, statistical graphics and computing, and environmental statistics.
See More
Oliver Keyes HeadshotOliver Keyes

Data Scientist

Oliver is a long-time data scientist and currently works as a Ph.D. student and instructor at the University of Washington. They are the developer of over 30 R packages, including many standard API clients.
See More

Join over 13 million learners and start Working with Web Data in R today!

Create Your Free Account

GoogleLinkedInFacebook

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.