Skip to main content
HomePythonWeb Scraping in Python

Web Scraping in Python

4.2+
45 reviews
Intermediate

Learn to retrieve and parse information from the internet using the Python library scrapy.

Start Course for Free
4 Hours17 Videos56 Exercises
75,106 LearnersTrophyStatement of Accomplishment

Create Your Free Account

GoogleLinkedInFacebook

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.
GroupTraining 2 or more people?Try DataCamp For Business

Loved by learners at thousands of companies


Course Description

The ability to build tools capable of retrieving and parsing information stored across the internet has been and continues to be valuable in many veins of data science. In this course, you will learn to navigate and parse html code, and build tools to crawl websites automatically. Although our scraping will be conducted using the versatile Python library scrapy, many of the techniques you learn in this course can be applied to other popular Python libraries as well, including BeautifulSoup and Selenium. Upon the completion of this course, you will have a strong mental model of html structure, will be able to build tools to parse html code and access desired information, and create a simple scrapy spiders to crawl the web at scale.
For Business

GroupTraining 2 or more people?

Get your team access to the full DataCamp library, with centralized reporting, assignments, projects and more
Try DataCamp for BusinessFor a bespoke solution book a demo.

In the following Tracks

Python Developer

Go To Track
  1. 1

    Introduction to HTML

    Free

    Learn the structure of HTML. We begin by explaining why web scraping can be a valuable addition to your data science toolbox and then delving into some basics of HTML. We end the chapter by giving a brief introduction on XPath notation, which is used to navigate the elements within HTML code.

    Play Chapter Now
    Web Scraping Overview
    50 xp
    Web-scraping is not nonsense!
    50 xp
    HyperText Markup Language
    50 xp
    HTML tree wordy navigation
    50 xp
    From Tree to HTML
    100 xp
    Attributes
    50 xp
    Keep it Classy
    100 xp
    Finding href
    50 xp
    Crash Course in XPath
    50 xp
    Where am I?
    100 xp
    It's Time to P
    100 xp
    A classy span
    100 xp
  2. 3

    CSS Locators, Chaining, and Responses

    Learn CSS Locator syntax and begin playing with the idea of chaining together CSS Locators with XPath. We also introduce Response objects, which behave like Selectors but give us extra tools to mobilize our scraping efforts across multiple websites.

    Play Chapter Now
  3. 4

    Spiders

    Learn to create web crawlers with scrapy. These scrapy spiders will crawl the web through multiple pages, following links to scrape each of those pages automatically according to the procedures we've learned in the previous chapters.

    Play Chapter Now
For Business

GroupTraining 2 or more people?

Get your team access to the full DataCamp library, with centralized reporting, assignments, projects and more

In the following Tracks

Python Developer

Go To Track

Datasets

DataCamp webpage HTML

Collaborators

Collaborator's avatar
David Campos
Collaborator's avatar
Mari Nazary
Collaborator's avatar
Shon Inouye

Prerequisites

Intermediate Python
Thomas Laetsch HeadshotThomas Laetsch

Data Scientist at New York University

Since January 2016, Thomas Laetsch has been a Moore-Sloan Post-Doctoral Associate in the Center for Data Science at NYU. In 2012, he received his PhD in mathematics from the University of California, San Diego, specializing in probability, differential geometry, and functional analysis. From 2012 through 2015, he was a Visiting Assistant Professor at the University of Connecticut, working on central tendency theorems for random walks in degenerate spaces.
See More

Don’t just take our word for it

*4.2
from 45 reviews
62%
13%
18%
4%
2%
Sort by
  • Carlo P.
    about 1 month

    The course was nice, it was well structured and the exercises were really well made. I liked the fact that often it wasn't simple "fill in the blank" type of exercise, but that it was often necessary to write entire lines of code. There are many courses where the exercises are way too easy and it kinda defeats the purpose of the exercises, namely testing your memory and understanding, as well as forcing you to actually apply what was seen in the preceding lectures. This course was not one of them. I wasn't really interested in the topic per se, but the lectures were clear and the exercises were super helpful.

  • Edgar M.
    about 2 months

    Awesome course

  • Vlad R.
    2 months

    I ended up not using spiders themselves but the course opened the world of scraping to me so now I can experiment with what works best to me.

  • Thabo L.
    4 months

    It was a well detailed introduction and i got to fulfully understand

  • F.J. H.
    6 months

    As I said, great, but there were a lot of coding I felt I hadn't learned to replicate or write from scratch whenever I needed it (and instead simply copied-pasted from a previous exercise or else was already provided in the iPython shell).

"Awesome course"

Edgar M.

"I ended up not using spiders themselves but the course opened the world of scraping to me so now I can experiment with what works best to me."

Vlad R.

"It was a well detailed introduction and i got to fulfully understand"

Thabo L.

FAQs

Join over 14 million learners and start Web Scraping in Python today!

Create Your Free Account

GoogleLinkedInFacebook

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.