メインコンテンツへスキップ

コース

Rで学ぶ中級正規表現

中級スキルレベル

更新日 2024/11

Rで正規表現と文字列距離を習得し、テキストデータの操作・分析などを実践的に身につけましょう。

コースを無料で開始

RProgramming

4時間

14 ビデオ

48 演習

3,650 XP

4,740

修了証明書

何千もの企業の従業員が支持

チームのトレーニングを担当していますか？

Businessをお試しください

コース説明

表形式のデータを分析するのは楽しいものです。でも、いちばん知りたい情報が、きれいに整ったデータセットではなくプレーンテキストでしか手に入らないとしたらどうでしょう？ご安心ください。このコースでは、テキストの塊から分析に必要な情報を見つけ出せる、強力な正規表現の作り方を一から学びます。さらに、文字列距離の考え方を使って、誤字やスキャンミスを含むテキストでも、他のデータソースの正しい候補に照合できる（レコードリンケージ）方法を学びます。学習素材としては、スイスの映画館における興行収入に関する実在の文書を分析します。

前提条件

Introduction to the Tidyverse String Manipulation with stringr in R

1

Regular Expressions: Writing Custom Patterns

Regular expressions can be pretty intimidating at first as they contain vast amounts of special characters. In this chapter, you'll learn to decipher these and write your own patterns to find exactly what you're looking for.

Starts with, ends with

If you don't know what you're looking for

Character classes and repetitions

Digits, words and spaces

Match repetitions

Which special character did what again?

The pipe and the question mark

This or that

The question mark and its two meanings

You can now read this!

チャプターを開始

2

Creating Strings with Data

In this chapter, we will slightly move away from regular expressions and focus on string manipulation by creating strings from other data structures like vectors or lists.

Getting to know glue

Stop pasting, start gluing

Gluing data frames

How many arguments can glue take?

Collapsing multiple elements into a string

Formulating a question from a list

Collapsing data frames

Glue and Collapse, what's the difference?

Gluing regular expressions

Construct "or patterns" with glue

Using the "or pattern" with a larger dataset

Make advanced patterns more readable

チャプターを開始

3

Extracting Structured Data From Text

One task where regular expressions really shine is making sense from a blob of text. In this chapter, you'll learn to extract the information from messy data that doesn't come in neatly arranged tables but in plain text.

Capturing groups

Match all capturing groups

Search and replace

Can you nest capturing groups?

tidyr's extract

Creating a regex that matches your needs

Why does this fail?

Extracting an advanced regular expression

Extracting matches and surroundings from a text

Extract names with context

So many special characters

チャプターを開始

4

Similarities Between Strings

In the last chapter, we will shift gears away from regular expressions to understanding string distances. By calculating the differences of multiple strings, we can match those that are similar. This will help us to find duplicates even when they contain small errors like typos. This is an important part to record linkage where we combine datasets from multiple sources.

Understanding string distances

Calculating a string distance

Finding a match to a search typo

Methods of string distances

Edit distances vs. q-gram methods

Trying out different methods

Is one distance better than the other?

Fuzzy joins

Performing a string distance join

String distances of short strings

Custom Fuzzy Matching

Finding matches based on two conditions

Why join on multiple columns?

Congratulations

チャプターを開始

Rで学ぶ中級正規表現

コース完了

修了証明書を取得

この修了書をLinkedInや履歴書、CVに追加しましょう
ソーシャルメディアや人事評価で共有しましょう今すぐ登録

19百万人を超える学習者と共にRで学ぶ中級正規表現を始めましょう！

DataCamp for Mobileでデータスキルを磨きましょう

モバイルコースと毎日の 5 分間のコーディングチャレンジで、外出先でも進歩できます。