Skip to main content
Paul Love avatar

Paul Love has completed

Web Scraping in R

Start course For Free
4 hours
3,600 XP
Statement of Accomplishment Badge

Loved by learners at thousands of companies

Course Description

Have you ever come across a website that displays a lot of data such as statistics, product reviews, or prices in a format that’s not data analysis-ready? Often, authorities and other data providers publish their data in neatly formatted tables. However, not all of these sites include a download button, but don’t despair. In this course, you’ll learn how to efficiently collect and download data from any website using R. You'll learn how to automate the scraping and parsing of Wikipedia using the rvest and httr packages. Through hands-on exercises, you’ll also expand your understanding of HTML and CSS, the building blocks of web pages, as you make your data harvesting workflows less error-prone and more efficient.
For Business

GroupTraining 2 or more people?

Get your team access to the full DataCamp library, with centralized reporting, assignments, projects and more
Try DataCamp for BusinessFor a bespoke solution book a demo.
  1. 1

    Introduction to HTML and Web Scraping


    In this chapter, you'll be introduced to Hyper Text Markup Language (HTML), a declarative language used to structure modern websites. Using the rvest library, you'll learn how to query simple HTML elements and scrape your first table.

    Play Chapter Now
    Introduction to HTML
    50 xp
    Read in HTML
    100 xp
    Beware of syntax errors!
    50 xp
    Navigating HTML
    50 xp
    Select all children of a list
    100 xp
    Parse hyperlinks into a data frame
    100 xp
    Scrape your first table
    50 xp
    The right order of table elements
    100 xp
    Turn a table into a data frame with html_table()
    100 xp
  2. 4

    Scraping Best Practices

    Now that you know how to extract content from web pages, it's time to look behind the curtains. In this final chapter, you’ll learn why HTTP requests are the foundation of every scraping action and how they can be customized to comply with best practices in web scraping.

    Play Chapter Now

In the following tracks

R Developer


Collaborator's avatar
Amy Peterson
Collaborator's avatar
Maggie Matsui
Timo Grossenbacher HeadshotTimo Grossenbacher

Head of Newsroom Automation at Tamedia

Timo Grossenbacher is Head of Newsroom Automation at Swiss publisher Tamedia. Prior to that, he used to be a data journalist working with the Swiss Public Broadcast (SRF), where he used scripting and databases for almost every data-driven story he published. He also teaches data journalism at the University of Zurich and is the creator of – resources for doing data journalism with R. Follow him at grssnbchr on Twitter or visit his personal website.
See More

Join over 13 million learners and start Web Scraping in R today!

Create Your Free Account



By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.