Frank ZHANG has completed

Working with Web Data in R

4 hr

4,500 XP

Loved by learners at thousands of companies

Course Description

Most of the useful data in the world, from economic data to news content to geographic information, lives somewhere on the internet - and this course will teach you how to access it.You'll explore how to work with APIs (computer-readable interfaces to websites), access data from Wikipedia and other sources, and build your own simple API client. For those occasions where APIs are not available, you'll find out how to use R to scrape information out of web pages. In the process you'll learn how to get data out of even the most stubborn website, and how to turn it into a format ready for further analysis. The packages you'll use and learn your way around are rvest, httr, xml2 and jsonlite, along with particular API client packages like WikipediR and pageviews.

For Business

Training 2 or more people?

Get your team access to the full DataCamp platform, including all the features.

1
Downloading Files and Using API Clients
Free
Sometimes getting data off the internet is very, very simple - it's stored in a format that R can handle and just lives on a server somewhere, or it's in a more complex format and perhaps part of an API but there's an R package designed to make using it a piece of cake. This chapter will explore how to download and read in static files, and how to use APIs when pre-existing clients are available.
Play Chapter Now
Introduction: Working With Web Data in R
50 xp
Downloading files and reading them into R
100 xp
Saving raw files to disk
100 xp
Saving formatted files to disk
100 xp
Understanding Application Programming Interfaces
50 xp
API test
50 xp
Using API clients
100 xp
Access tokens and APIs
50 xp
Using access tokens
100 xp
2
Using httr to interact with APIs directly
If an API client doesn't exist, it's up to you to communicate directly with the API. But don't worry, the package httr makes this really straightforward. In this chapter you'll learn how to make web requests from R, how to examine the responses you get back and some best practices for doing this in a responsible way.
Play Chapter Now
GET and POST requests in theory
50 xp
GET requests in practice
100 xp
POST requests in practice
100 xp
Extracting the response
100 xp
Multiple Choice: GET and POST requests
50 xp
Graceful httr
50 xp
Handling http failures
100 xp
Constructing queries (Part I)
100 xp
Constructing queries (Part II)
100 xp
Respectful API usage
50 xp
Using user agents
100 xp
Rate-limiting
100 xp
Tying it all together
100 xp
3
Handling JSON and XML
Sometimes data is a TSV or nice plaintext output. Sometimes it's XML and/or JSON. This chapter walks you through what JSON and XML are, how to convert them into R-like objects, and how to extract data from them. You'll practice by examining the revision history for a Wikipedia article retrieved from the Wikipedia API using httr, xml2 and jsonlite.
Play Chapter Now
JSON
50 xp
Can you spot JSON?
50 xp
Parsing JSON
100 xp
Manipulating JSON
50 xp
Manipulating parsed JSON
100 xp
Reformatting JSON
100 xp
XML structure
50 xp
Do you understand XML structure?
50 xp
Examining XML documents
100 xp
XPATHs
50 xp
Extracting XML data
100 xp
Extracting XML attributes
100 xp
Wrapup: returning nice API output
100 xp
4
Web scraping with XPATHs
Now that we've covered the low-hanging fruit ("it has an API, and a client", "it has an API") it's time to talk about what to do when a website doesn't have any access mechanisms at all - when you have to rely on web scraping. This chapter will introduce you to the rvest web-scraping package, and build on your previous knowledge of XML manipulation and XPATHs.
Play Chapter Now
Web scraping 101
50 xp
Reading HTML
100 xp
Extracting nodes by XPATH
100 xp
HTML structure
50 xp
Extracting names
100 xp
Extracting values
100 xp
Test: HTML reading and extraction
50 xp
Reformatting Data
50 xp
Extracting tables
100 xp
Cleaning a data frame
100 xp
5
CSS Web Scraping and Final Case Study
CSS path-based web scraping is a far-more-pleasant alternative to using XPATHs. You'll start this chapter by learning about CSS, and how to leverage it for web scraping. Then, you'll work through a final case study that combines everything you've learnt so far to write a function that queries an API, parses the response and returns data in a nice form.
Play Chapter Now
CSS web scraping in theory
50 xp
Using CSS to scrape nodes
100 xp
Scraping names
100 xp
Scraping text
100 xp
Test: CSS web scraping
50 xp
Final case study: Introduction
50 xp
API calls
100 xp
Extracting information
100 xp
Normalising information
100 xp
Reproducibility
100 xp
Wrap Up
50 xp

For Business

Training 2 or more people?

Get your team access to the full DataCamp platform, including all the features.

collaborators

Richie Cotton

prerequisites

Intermediate R

Charlotte Wickham

Assistant Professor at Oregon State University

Charlotte is an Assistant Professor in the Department of Statistics at Oregon State University and an avid R programmer with a passion for teaching. Her interests lie in spatiotemporal data, statistical graphics and computing, and environmental statistics.

Oliver Keyes

Data Scientist

Oliver is a long-time data scientist and currently works as a Ph.D. student and instructor at the University of Washington. They are the developer of over 30 R packages, including many standard API clients.

Join over 18 million learners and start Working with Web Data in R today!

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.

Working with Web Data in R

Loved by learners at thousands of companies

Course Description

.css-10r9e5n{-webkit-margin-end:8px;margin-inline-end:8px;}.css-1309hh9{-webkit-flex-shrink:0;-ms-flex-negative:0;flex-shrink:0;-webkit-margin-end:8px;margin-inline-end:8px;}Training 2 or more people?

Downloading Files and Using API Clients

Using httr to interact with APIs directly

Handling JSON and XML

Web scraping with XPATHs

CSS Web Scraping and Final Case Study

Training 2 or more people?

Join over .css-ou6dz6{color:#03ef62;}18 million learners and start Working with Web Data in R today!

Create Your Free Account

Training 2 or more people?

Join over 18 million learners and start Working with Web Data in R today!