본문으로 바로가기

강의

R 중급 정규 표현식

중급기술 수준

업데이트됨 2024. 11.

R에서 정규표현식과 문자열 거리로 텍스트를 조작·분석하며 더 깊이 있는 기술을 익히십시오.

무료로 강의 시작

RProgramming

4시간

14 동영상

48 연습 문제

3,650 XP

4,740

성취 증명서

수천 개 기업의 학습자들이 사랑하는

팀을 교육하시나요?

비즈니스용으로 체험해 보세요

강의 설명

표 형태의 데이터를 분석하는 일은 재미있어요. 하지만 가장 흥미로운 정보가 깔끔하게 정리된 데이터셋이 아니라 일반 텍스트로만 있다면 어떨까요? 걱정하지 마세요. 이 강의에서는 강력한 정규 표현식을 만들어 텍스트 덩어리에서 분석에 필요한 모든 정보를 찾아내는 방법을 배웁니다. 거기에 더해, 문자열 거리 개념을 이용해 오탈자나 스캔 오류가 있는 텍스트도 다른 데이터 소스의 올바른 항목과 매칭해 다룰 수 있게 됩니다(레코드 연결). 학습 자료로는 스위스 영화관의 박스 오피스 실적 관련 실제 문서를 분석해 보겠습니다.

선수 조건

Introduction to the Tidyverse String Manipulation with stringr in R

1

Regular Expressions: Writing Custom Patterns

Regular expressions can be pretty intimidating at first as they contain vast amounts of special characters. In this chapter, you'll learn to decipher these and write your own patterns to find exactly what you're looking for.

Starts with, ends with

If you don't know what you're looking for

Character classes and repetitions

Digits, words and spaces

Match repetitions

Which special character did what again?

The pipe and the question mark

This or that

The question mark and its two meanings

You can now read this!

2

Creating Strings with Data

In this chapter, we will slightly move away from regular expressions and focus on string manipulation by creating strings from other data structures like vectors or lists.

Getting to know glue

Stop pasting, start gluing

Gluing data frames

How many arguments can glue take?

Collapsing multiple elements into a string

Formulating a question from a list

Collapsing data frames

Glue and Collapse, what's the difference?

Gluing regular expressions

Construct "or patterns" with glue

Using the "or pattern" with a larger dataset

Make advanced patterns more readable

3

Extracting Structured Data From Text

One task where regular expressions really shine is making sense from a blob of text. In this chapter, you'll learn to extract the information from messy data that doesn't come in neatly arranged tables but in plain text.

Capturing groups

Match all capturing groups

Search and replace

Can you nest capturing groups?

tidyr's extract

Creating a regex that matches your needs

Why does this fail?

Extracting an advanced regular expression

Extracting matches and surroundings from a text

Extract names with context

So many special characters

4

Similarities Between Strings

In the last chapter, we will shift gears away from regular expressions to understanding string distances. By calculating the differences of multiple strings, we can match those that are similar. This will help us to find duplicates even when they contain small errors like typos. This is an important part to record linkage where we combine datasets from multiple sources.

Understanding string distances

Calculating a string distance

Finding a match to a search typo

Methods of string distances

Edit distances vs. q-gram methods

Trying out different methods

Is one distance better than the other?

Fuzzy joins

Performing a string distance join

String distances of short strings

Custom Fuzzy Matching

Finding matches based on two conditions

Why join on multiple columns?

Congratulations

R 중급 정규 표현식

강의
완료

수료증 획득

LinkedIn 프로필, 이력서 또는 CV에 이 인증서를 추가하세요
소셜 미디어와 성과 평가에서 공유하세요지금 등록

19백만 명 이상의 학습자와 함께 R 중급 정규 표현식을(를) 시작하세요!

DataCamp for Mobile을 통해 데이터 분석 능력을 향상시키세요.

모바일 강좌와 매일 5분 코딩 챌린지를 통해 이동 중에도 학습 효과를 높이세요.