Accéder au contenu principal

Cours

Expressions régulières intermédiaires en R

IntermédiaireNiveau de compétence

Actualisé 11/2024

Manipulez des données textuelles, analysez-les et bien plus en maîtrisant les expressions régulières et les distances entre chaînes dans R.

Commencer le cours gratuitement

RProgramming

4 h

14 vidéos

48 Exercices

3,650 XP

4,740

Certificat de formation

Apprécié par des utilisateurs provenant de milliers d'entreprises

Former une équipe ?

Essayez pour les entreprises

Description du cours

Analyser des données sous forme de tableaux, c’est agréable. Mais que faire si ce qui vous intéresse le plus ne se présente pas comme un jeu de données bien structuré, mais sous forme de texte brut ? Pas de panique : dans ce cours, vous apprendrez tout ce qu’il faut pour créer des expressions régulières puissantes qui vous permettront de retrouver toutes les informations nécessaires à vos analyses à partir d’un simple bloc de texte. Et ce n’est pas tout. Grâce au concept de distance entre chaînes, vous apprendrez à travailler avec du texte contenant des fautes de frappe ou des erreurs de numérisation, en pouvant les faire correspondre à leurs équivalents corrects issus d’autres sources de données (rattachement d’enregistrements). Comme support d’apprentissage, nous analyserons de vrais documents sur les chiffres du box-office dans les cinémas suisses.

Prérequis

Introduction to the Tidyverse String Manipulation with stringr in R

1

Regular Expressions: Writing Custom Patterns

Regular expressions can be pretty intimidating at first as they contain vast amounts of special characters. In this chapter, you'll learn to decipher these and write your own patterns to find exactly what you're looking for.

Starts with, ends with

If you don't know what you're looking for

Character classes and repetitions

Digits, words and spaces

Match repetitions

Which special character did what again?

The pipe and the question mark

This or that

The question mark and its two meanings

You can now read this!

Commencer le chapitre

2

Creating Strings with Data

In this chapter, we will slightly move away from regular expressions and focus on string manipulation by creating strings from other data structures like vectors or lists.

Getting to know glue

Stop pasting, start gluing

Gluing data frames

How many arguments can glue take?

Collapsing multiple elements into a string

Formulating a question from a list

Collapsing data frames

Glue and Collapse, what's the difference?

Gluing regular expressions

Construct "or patterns" with glue

Using the "or pattern" with a larger dataset

Make advanced patterns more readable

Commencer le chapitre

3

Extracting Structured Data From Text

One task where regular expressions really shine is making sense from a blob of text. In this chapter, you'll learn to extract the information from messy data that doesn't come in neatly arranged tables but in plain text.

Capturing groups

Match all capturing groups

Search and replace

Can you nest capturing groups?

tidyr's extract

Creating a regex that matches your needs

Why does this fail?

Extracting an advanced regular expression

Extracting matches and surroundings from a text

Extract names with context

So many special characters

Commencer le chapitre

4

Similarities Between Strings

In the last chapter, we will shift gears away from regular expressions to understanding string distances. By calculating the differences of multiple strings, we can match those that are similar. This will help us to find duplicates even when they contain small errors like typos. This is an important part to record linkage where we combine datasets from multiple sources.

Understanding string distances

Calculating a string distance

Finding a match to a search typo

Methods of string distances

Edit distances vs. q-gram methods

Trying out different methods

Is one distance better than the other?

Fuzzy joins

Performing a string distance join

String distances of short strings

Custom Fuzzy Matching

Finding matches based on two conditions

Why join on multiple columns?

Congratulations

Commencer le chapitre

Expressions régulières intermédiaires en R

Cours
terminé

Obtenez un certificat de réussite

Ajoutez cette certification à votre profil LinkedIn, à votre CV ou à votre portfolio
Partagez-la sur les réseaux sociaux et dans votre évaluation de performanceS'inscrire maintenant

Rejoignez plus de 19 millions d'utilisateurs et commencez Expressions régulières intermédiaires en R dès aujourd'hui !

Apprenez où que vous soyez avec l'application DataCamp

Progressez où que vous soyez grâce à nos cours conçus pour mobile et à nos défis quotidiens de 5 minutes.