Web Scraping in R

Learn how to efficiently collect and download data from any website using R.
Start Course for Free
4 Hours13 Videos45 Exercises
3600 XP

Create Your Free Account

GoogleLinkedInFacebook
or
By continuing you accept the Terms of Use and Privacy Policy. You also accept that you are aware that your data will be stored outside of the EU and that you are above the age of 16.

Loved by learners at thousands of companies


Course Description

Have you ever come across a website that displays a lot of data such as statistics, product reviews, or prices in a format that’s not data analysis-ready? Often, authorities and other data providers publish their data in neatly formatted tables. However, not all of these sites include a download button, but don’t despair. In this course, you’ll learn how to efficiently collect and download data from any website using R. You'll learn how to automate the scraping and parsing of Wikipedia using the rvest and httr packages. Through hands-on exercises, you’ll also expand your understanding of HTML and CSS, the building blocks of web pages, as you make your data harvesting workflows less error-prone and more efficient.

  1. 1

    Introduction to HTML and Web Scraping

    Free
    In this chapter, you'll be introduced to Hyper Text Markup Language (HTML), a declarative language used to structure modern websites. Using the rvest library, you'll learn how to query simple HTML elements and scrape your first table.
    Play Chapter Now
  2. 2

    Navigation and Selection with CSS

    Cascading Style Sheets (CSS) describe how HTML elements are displayed on a web page, including colors, fonts, and general layout. In this chapter, you'll learn why CSS selectors and combinators are a crucial ingredient for web scraping.
    Play Chapter Now
  3. 3

    Advanced Selection with XPATH

    The CSS selectors you got to know in the last chapter are powerful but have their limitations. For example, if you want to select nodes based on the properties of their descendants. XPath to the rescue! Using this query language, you can navigate and scrape even the most hideous HTML.
    Play Chapter Now
  4. 4

    Scraping Best Practices

    Now that you know how to extract content from web pages, it's time to look behind the curtains. In this final chapter, you’ll learn why HTTP requests are the foundation of every scraping action and how they can be customized to comply with best practices in web scraping.
    Play Chapter Now
Collaborators
Maggie MatsuiAmy Peterson
Prerequisites
Intermediate RIntroduction to the Tidyverse
Timo Grossenbacher Headshot

Timo Grossenbacher

Project Lead Automated Journalism at Tamedia
Timo Grossenbacher is a project lead for automated journalism at Swiss publisher Tamedia. Prior to that, he used to be a data journalist working with the Swiss Public Broadcast (SRF), where he used scripting and databases for almost every data-driven story he published. He also teaches data journalism at the University of Zurich and is the creator of rddj.info – resources for doing data journalism with R. Follow him at grssnbchr on Twitter or visit his personal website.
See More

What do other learners have to say?

I've used other sites—Coursera, Udacity, things like that—but DataCamp's been the one that I've stuck with.

Devon Edwards Joseph
Lloyds Banking Group

DataCamp is the top resource I recommend for learning data science.

Louis Maiden
Harvard Business School

DataCamp is by far my favorite website to learn from.

Ronald Bowers
Decision Science Analytics, USAA

Join over 7 million learners and start Web Scraping in R today!

Create Your Free Account

GoogleLinkedInFacebook
or
By continuing you accept the Terms of Use and Privacy Policy. You also accept that you are aware that your data will be stored outside of the EU and that you are above the age of 16.