Pular para o conteúdo principal

Curso

Web Scraping em R

IntermediárioNível de habilidade

Atualizado 04/2024

Aprenda a coletar e baixar dados de qualquer site de forma eficiente usando o R.

Iniciar curso gratuitamente

RData Preparation

4 h

13 vídeos

45 Exercícios

3,600 XP

14,993

Declaração de realização

Preferido por alunos de milhares de empresas

Treinando uma equipe?

Experimente para Empresas

Descrição do curso

Já aconteceu de você encontrar um site cheio de dados — como estatísticas, avaliações de produtos ou preços — mas em um formato nada pronto para análise? Muitas vezes, órgãos e outros provedores publicam dados em tabelas bem organizadas. No entanto, nem todos esses sites têm um botão de download. Não se preocupe: neste curso, você vai aprender a coletar e baixar dados de qualquer site usando R. Você vai automatizar a raspagem e a extração do Wikipedia com os pacotes rvest e httr. Com exercícios práticos, você também vai aprofundar seu entendimento de HTML e CSS — a base das páginas web — enquanto torna seus fluxos de coleta de dados mais eficientes e menos propensos a erros.

Pré-requisitos

Intermediate R Introduction to the Tidyverse

1

Introduction to HTML and Web Scraping

In this chapter, you'll be introduced to Hyper Text Markup Language (HTML), a declarative language used to structure modern websites. Using the rvest library, you'll learn how to query simple HTML elements and scrape your first table.

Introduction to HTML

Read in HTML

Beware of syntax errors!

Navigating HTML

Select all children of a list

Parse hyperlinks into a data frame

Scrape your first table

The right order of table elements

Turn a table into a data frame with html_table()

Iniciar capítulo

2

Navigation and Selection with CSS

Cascading Style Sheets (CSS) describe how HTML elements are displayed on a web page, including colors, fonts, and general layout. In this chapter, you'll learn why CSS selectors and combinators are a crucial ingredient for web scraping.

Introduction to CSS

Select multiple HTML types

Order CSS selectors by the number of results

CSS classes and IDs

Identify the correct selector types

Leverage the uniqueness of IDs

Select the last child with a pseudo-class

CSS combinators

Select direct descendants with the child combinator

How many elements get returned?

Simply the best!

Not every sibling is the same

Iniciar capítulo

3

Advanced Selection with XPATH

The CSS selectors you got to know in the last chapter are powerful but have their limitations. For example, if you want to select nodes based on the properties of their descendants. XPath to the rescue! Using this query language, you can navigate and scrape even the most hideous HTML.

Introduction to XPATH

Find the correct CSS equivalent

Select by class and ID with XPATH

Use predicates to select nodes based on their children

XPATH functions and advanced predicates

Find a more elegant XPATH alternative

Get to know the position() function

Extract nodes based on the number of their children

The XPATH text() function

The shortcomings of html_table() with badly structured tables

Select directly from a parent element with XPATH's text()

Combine extracted data into a data frame

Scrape an element based on its text

Iniciar capítulo

4

Scraping Best Practices

Now that you know how to extract content from web pages, it's time to look behind the curtains. In this final chapter, you’ll learn why HTTP requests are the foundation of every scraping action and how they can be customized to comply with best practices in web scraping.

The nature of HTTP requests

Which of these statements about HTTP is false?

Do it the httr way

Houston, we got a 404!

Telling who you are with custom user agents

Check out your user agent

Add a custom user agent

How to be gentle and slow down your requests

Custom arguments for throttled functions

Apply throttling to a multi-page crawler

Recap: Web Scraping in R

Iniciar capítulo

Web Scraping em R

Curso
concluído

Obtenha um certificado de conclusão

Adicione esta credencial ao seu perfil do LinkedIn, currículo ou CV
Compartilhe nas redes sociais e em sua avaliação de desempenhoInscreva-se agora

Faça como mais de 19 milhões de alunos e comece Web Scraping em R hoje mesmo!

Desenvolva suas habilidades em dados com o app do DataCamp

Continue progredindo em qualquer lugar com nossos cursos para celular e desafios diários de programação de 5 minutos.